Running CentOS in production requires more than just installation and configuration. When critical issues arise—disk failures, RAID degradation, memory exhaustion, or service outages—you need immediate notification and clear troubleshooting procedures.
This guide covers comprehensive monitoring and alerting strategies for CentOS systems, with particular focus on RAID array health and hardware failures. We will explore multiple notification channels, from simple email alerts to integrated dashboard solutions.
Why Monitoring Matters
The Cost of Unnoticed Failures
Issue Type
Downtime Cost
Detection Method
Single disk failure (RAID 1)
Zero (if detected)
SMART monitoring
RAID degradation
Hours to days
Array scrubbing
Memory exhaustion
Minutes
Process monitoring
Full filesystem
Variable
Capacity alerts
Service crash
Until restart
Process supervision
Without monitoring, a simple RAID 1 disk failure escalates to data loss when the second disk fails—often within weeks or months of the first.
Core Monitoring Components
1. RAID Array Monitoring (mdadm)
Linux software RAID requires active monitoring through the mdadm utility.
Checking RAID Status
1 2 3 4 5 6 7 8 9 10 11 12 13
# Overall RAID status cat /proc/mdstat
# Detailed array information mdadm --detail /dev/md0 mdadm --detail /dev/md1
# Check memory usage MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}') if [ "$MEMORY_USAGE" -gt "$MEMORY_THRESHOLD" ]; then echo"High memory usage: ${MEMORY_USAGE}%" | \ mail -s "ALERT: Memory ${MEMORY_USAGE}% on $(hostname)" admin@example.com fi
# Check CPU load (5-minute average) CPU_LOAD=$(uptime | awk -F'load average:''{print $2}' | awk '{print $2}' | sed 's/,//') CPU_INT=$(echo"$CPU_LOAD * 100 / 1" | bc) if [ "$CPU_INT" -gt "$((100 * $(nproc)))" ]; then echo"High CPU load: $CPU_LOAD" | \ mail -s "ALERT: CPU Load $CPU_LOAD on $(hostname)" admin@example.com fi
# Check disk usage for DISK in $(df -h | grep '/dev/' | awk '{print $1}'); do USAGE=$(df -h "$DISK" | awk 'NR==2 {print $5}' | sed 's/%//') if [ "$USAGE" -gt "$DISK_THRESHOLD" ]; then echo"Disk $DISK at ${USAGE}% capacity" | \ mail -s "ALERT: Disk Space ${USAGE}% on $(hostname)" admin@example.com fi done
Notification Boards and Dashboards
Option 1: Simple Web Dashboard
Create a status dashboard using a lightweight web server.