OpsDash User Manual
1. Quick Start
1.1 Requirements
| Dependency | Minimum Version | Notes |
|---|---|---|
| Docker & Docker Compose | Docker 24+, Compose v2 | Runs all infrastructure containers |
| Python | 3.11+ | Backend runtime |
| Node.js | 18+ | Frontend build tooling |
| OS | Linux / macOS / WSL2 | Ubuntu 22.04+ recommended |
Hardware: 4-core CPU, 8 GB RAM, 50 GB disk (VictoriaMetrics retains 90 days of data).
1.2 Deployment Steps
Step 1: Start Infrastructure Containers
docker compose up -d
This launches 7 services:
| Container | Image | Host Port |
|---|---|---|
| opsdash-postgres | postgres:16 | 5433 |
| opsdash-redis | redis:7 | 6381 |
| opsdash-zabbix-db | postgres:16 | 5434 |
| opsdash-zabbix-server | zabbix/zabbix-server-pgsql:7.0 | 10051 |
| opsdash-zabbix-web | zabbix/zabbix-web-nginx-pgsql:7.0 | 8080 |
| opsdash-emqx | emqx/emqx:5.8 | 11883 / 28083 |
| opsdash-victoriametrics | victoriametrics/victoria-metrics:v1.106.1 | 8428 |
Step 2: Configure Environment Variables
Create a .env file in the backend/ directory:
DATABASE_URL=postgresql+asyncpg://opsdash:opsdash123@localhost:5433/opsdash
REDIS_URL=redis://localhost:6381/0
SECRET_KEY=your-production-secret-key-change-me
CREDENTIAL_KEY=your-32-byte-encryption-key
ZABBIX_URL=http://localhost:8080/api_jsonrpc.php
ZABBIX_API_TOKEN=
EMQX_API_URL=http://localhost:28083/api/v5
EMQX_API_KEY=
EMQX_API_SECRET=
VM_QUERY_URL=http://localhost:8428
VM_WRITE_URL=http://localhost:8428
Step 3: Install Backend Dependencies & Initialize Database
cd backend
pip install -r requirements.txt
python3 -m app seed
The seed script creates all database tables and populates initial data including the admin account (admin / admin123), sample departments, devices, and alerts.
Step 4: Start the Backend API
uvicorn app.main:app --reload --host 0.0.0.0 --port 8001
Step 5: Start the Frontend Dev Server
cd frontend
npm install
npm run dev
Frontend runs at http://localhost:3000 and proxies /api requests to the backend on port 8001.
1.3 First Login
- Open
http://localhost:3000in your browser - Log in with default admin credentials:
admin/admin123 - You will be redirected to the Operations Dashboard
- Important: Change the default admin password immediately via User Management
2. Feature Guide
2.1 Operations Dashboard
Navigation: Sidebar → Daily Ops → Operations Dashboard
The dashboard is the system homepage, providing a comprehensive view of your infrastructure. It uses tabbed navigation with 5 tabs:
| Tab | Content |
|---|---|
| Overview | KPI cards + department summary + live alert ticker |
| Charts | Device status pie chart + alert trend bar chart |
| Resource Usage | Top N device utilization + global trend graphs |
| Health Heatmap | Device health matrix (grouped by location/type/department) |
| Ops Timeline | Alert / maintenance / change event timeline |
Dashboard data is updated in real-time via WebSocket. When the monitoring engine detects device status changes or new alerts, the page refreshes automatically.
Filters: Six multi-select dimensions (department, device type, subtype, location, tags, manager). All tabs refresh in sync when filters change.
Custom Layout: Click the gear icon to show/hide modules per tab and adjust display order. Layout preferences are saved to browser local storage.
2.2 Alert Monitoring
Navigation: Sidebar → Daily Ops → Alerts → Live Alerts
The live alerts page supports multi-dimensional filtering: severity level, target type, alert source (zabbix/emqx/opsdash), resolution status, suppressed/excluded toggle, keyword search, and time range. All filter state is synced to the URL for easy sharing.
Alert Actions
- Acknowledge: Records who acknowledged the alert and when
- Resolve: Marks the alert as resolved, removing it from the active list
- Batch Operations: Select multiple alerts for bulk acknowledge or resolve
Alert Convergence Model
The system uses a "persistent open" model -- when the same target triggers the same rule condition continuously, no new alert record is created. Instead, the existing alert's repeat count is incremented. When the metric recovers, the open alert is automatically resolved.
Alert Correlation Groups
When a network device failure causes downstream devices to go offline, the system automatically groups related alerts for efficient root cause analysis.
- Topology Cascade: BFS traversal of topology links when a network device goes down
- Location Correlation: Multiple devices down at the same location are auto-grouped
Root Cause Analysis
Critical and warning alerts offer root cause analysis, showing upstream link alerts, correlation group lead alerts, and co-located alerts to help pinpoint the source of the issue.
2.3 Alert Rules
Navigation: Sidebar → Daily Ops → Alerts → Alert Rules
Alert rules define threshold-based automatic alerting. All enabled rules are evaluated every 60 seconds against metric data from VictoriaMetrics.
Key rule settings:
- Scope: Global / condition-based matching (9 dimensions) / specific targets
- Metric Name: Auto-discovered from VictoriaMetrics, searchable dropdown
- Comparison: gt / lt / gte / lte / eq / neq
- Dual Thresholds: Warning and Critical levels
- Duration: Seconds the metric must exceed the threshold before triggering
- Cooldown: Controls repeat notification frequency (default 300s)
- Notification Channels: Select specific channels or broadcast to all enabled ones
Alert Exclusion Rules
Navigation: Sidebar → Daily Ops → Alerts → Alert Exclusions
Alert exclusion rules filter out alerts that do not require attention (e.g., during planned maintenance). Matched alerts are still recorded but auto-resolved without sending notifications.
Three time modes: Permanent, One-time (start/end time), and Recurring (periodic maintenance windows).
2.4 Device Management
Navigation: Sidebar → Asset Management → Devices
Device List Features
- Multi-field search: Searches device name, hostname, IP, MAC address, and serial number simultaneously
- Rich filters: Department, device type, subtype, brand, status, zone, rack, tags, favorites
- Agent status column: Shows Zabbix Agent installation state per device
- URL state persistence: Filter state is synced to the URL and restored on navigation
- Import/Export: Bulk import/export in CSV/Excel format with automatic MAC/IP deduplication
Device Types
10 supported types: network, server, security, database, storage, industrial, iot, ups, hvac, custom.
Device Detail Page
9 tabs: Basic Info, Link Relations, Software, Alerts, Monitoring Data, IoT Location, Agent Management, Change Log, and Custom Fields.
Device status updates in real-time via WebSocket. Top-level stats include 30-day availability (SLA), downtime duration, and status reason.
Auto-Discovery
Enter a management IP and click "Auto Detect" -- the system probes the device (TCP scan, banner grab, SNMP query, fingerprint fusion) and fills in device type, brand/model, and more.
Batch Operations
Select multiple devices for: bulk delete, bulk update, bulk agent install/upgrade/uninstall/health check, config update, bulk template bind/unbind.
Agent Installation
One-click remote Zabbix Agent installation covering Ubuntu/Debian/CentOS/RHEL/Rocky/Alma across amd64/arm64/armhf architectures. Offline installation -- target machines do not need internet access. Batch installation is also supported.
2.5 Software Assets
Software instances are attached to server-type devices. Supports department/type filtering. Detail page includes 5 tabs: Basic Info, Dependencies, Alerts, Monitoring Data, Change Log.
For full details, see the Help Documentation within OpsDash.
2.6 Topology
Interactive network topology powered by AntV G6 5.0. Features node/link visualization, toolbar operations (layout switching, search, export), fault impact analysis, path analysis, historical playback, and multi-version comparison.
For full details, see the Help Documentation within OpsDash.
2.7 IoT Device Management
Deep IoT management module: device models (thing models), telemetry dashboards (gauges + historical charts), alert templates, geofencing (circle/polygon + enter/exit events), device command dispatch, batch configuration, firmware management and OTA upgrades, gateway topology, data quality dashboard.
For full details, see the Help Documentation within OpsDash.
2.8 Metrics Viewer
ECharts-based MetricChart component auto-discovers all device metrics in VictoriaMetrics and groups them by category. Supports line chart, bar chart, and gauge display modes.
For full details, see the Help Documentation within OpsDash.
2.9 Remote Operations
Batch command/script execution (10 vendor CLI profiles), script library, execution strategies (rolling/canary/grouped), risk assessment and approval workflows, DAG workflow orchestration, config backup with diff comparison, IPMI out-of-band management, execution statistics.
For full details, see the Help Documentation within OpsDash.
2.10 Network Discovery
Automated network scanning based on Zabbix Discovery: create scan tasks, auto-detect devices, review discovery results, one-click or batch onboarding. Supports TCP port scanning, SNMP probing, and ICMP ping.
For full details, see the Help Documentation within OpsDash.
2.11 Department Management
Multi-tenant department management: create/edit/delete departments, devices and software are isolated per department, supports department-level role assignment.
For full details, see the Help Documentation within OpsDash.
2.12 Users & Permissions
RBAC permission system: 5 built-in roles (admin/engineer/viewer/dept_engineer/dept_viewer), 43 permission strings across 21 resources. Supports global and department-scoped roles (one user can have different roles in different departments). SSO integration with Feishu, DingTalk, and WeCom.
For full details, see the Help Documentation within OpsDash.
2.13 Credential Management
Navigation: Sidebar → Operations → Remote Ops → Credentials
Centralized credential storage for all monitoring protocols. Supports 10 credential types (SNMP/SSH/Database/JMX/IPMI/WMI/ONVIF/K8s/HTTP/API Key), 4 scopes (global/subnet/device_type/device), encrypted at rest.
Key features:
- Ownership & Sharing: Credentials are private by default, with three sharing permission levels (use/edit/admin)
- Expiry & Rotation: Set expiration dates and rotation cycles, with automatic 6-hour background checks
- Connectivity Testing: Real network connectivity tests (SSH login, SNMP query, database connection, etc.)
- Usage Records: Automatic audit trail for every use (30-day retention)
- Smart Recommendations: Auto-suggests best-matching credentials in SSH/remote execution/agent install contexts
For full details, see the Help Documentation within OpsDash.
2.14 Tag Management
Lightweight cross-resource tagging: centralized tag color/category/description management, smart auto-tagging rules, multi-tag AND/OR filtering, batch tagging, tag merging, and usage analytics.
For full details, see the Help Documentation within OpsDash.
2.15 Notification Channels
4 notification channels: Email, DingTalk Webhook, Feishu Webhook, WeCom Webhook. Supports escalation policies (multi-step progressive notification).
For full details, see the Help Documentation within OpsDash.
2.16 - 2.27 More Features
OpsDash also includes:
- 2.16 Monitor Templates -- Zabbix monitoring template management
- 2.17 Network Profiles -- 10 vendor CLI adapter configurations
- 2.18 Proxy Management -- Zabbix Proxy full lifecycle (install/config/health/recovery/TLS/Proxy Groups)
- 2.19 API Token Management -- API tokens for third-party integrations
- 2.20 Audit Log -- Full operation audit trail (change diffs, IP tracking)
- 2.21 AI Assistant -- MCP Server integration with 189 AI tools
- 2.22 Recycle Bin -- Soft-delete data recovery
- 2.23 Engine Status -- Zabbix/EMQX/VM runtime health overview
- 2.24 Help Documentation -- In-app Markdown help system
- 2.25 Mobile Support -- 18 responsive pages for mobile devices
- 2.26 License -- Hardware fingerprint + RSA signature licensing
- 2.27 About -- Version info and contact
For full details, see the Help Documentation within OpsDash.
3. External Systems
Used during initial deployment setup and advanced troubleshooting. For daily monitoring, use the Engine Status page within OpsDash.
- Zabbix:
http://localhost:8080, defaultAdmin / zabbix. Communicates with OpsDash via JSON-RPC API - EMQX:
http://localhost:28083, defaultadmin / public. IoT MQTT message broker - VictoriaMetrics:
http://localhost:8428/vmui. Unified time-series database, 90-day retention
Full content with PromQL examples and integration configuration is available in OpsDash's Help Documentation.
4. Configuration
Includes environment variable reference, port map, and Docker container management. Key environment variables:
DATABASE_URL-- PostgreSQL connection (Docker-mapped to port 5433)REDIS_URL-- Redis connection (port 6381)SECRET_KEY-- JWT signing key (must change for production)CREDENTIAL_KEY-- Fernet encryption key (must set for production)ZABBIX_URL+ZABBIX_API_TOKEN-- Zabbix integrationEMQX_API_URL+KEY+SECRET-- EMQX integrationVM_QUERY_URL+VM_WRITE_URL-- VictoriaMetrics (port 8428)
Full environment variable table and port map available in OpsDash's Help Documentation.
5. FAQ
Forgot admin password?
Connect to PostgreSQL, generate a new password hash with python3 -c "from app.auth import get_password_hash; print(get_password_hash('new_password'))" and UPDATE the user record. Or delete the admin user and re-run python3 -m app seed to restore the default password admin123.
Device status stuck on "unknown"?
Verify the monitoring engine is running, the device has a monitoring protocol and management IP configured, and SNMP credentials are correct. New devices may take up to 60 seconds to start monitoring.
Zabbix / EMQX connection failed?
Check that containers are running, .env is configured correctly, and API tokens have been created. Zabbix Server may take 2-5 minutes to initialize on first start.
No data in VictoriaMetrics?
Verify the container is running, VM_QUERY_URL and VM_WRITE_URL in .env are correct, and the monitoring engine is actively collecting data.
IoT device can't connect via MQTT?
Check that the EMQX container is running, MQTT port 11883 is reachable, and device authentication credentials are correct.
More FAQs available in OpsDash's Help Documentation.