Node Health Monitoring
The node health matrix provides comprehensive monitoring of individual cluster nodes with detailed status cards, system information, and real-time resource usage tracking.
Node Health Matrix Overview
Individual Node Cards
Each cluster node is displayed as a detailed status card containing comprehensive health and system information:
![Figure needed]
Screenshot of node health matrix showing multiple node cards
Card Layout
- Grid Display: Nodes arranged in responsive grid layout
- Consistent Design: Uniform card design across all nodes
- Real-time Updates: All cards update simultaneously every 30 seconds
- Interactive Elements: Expandable sections for detailed information
Node Card Components
Header Information
![Figure needed]
Screenshot of node card header showing basic node information
Node Identification
- Node Name: Kubernetes node identifier (e.g., "worker-node-1")
- Operating System: OS type and architecture (e.g., "linux/amd64")
- Instance Type: vCloud instance configuration (e.g., "standard-4vcpu-8gb")
- Uptime: Duration the node has been running (e.g., "3 days, 14 hours")
Health Status Section
Status Indicators
Visual health indicators with clear iconography:
![Figure needed]
Screenshot showing different node health states
-
✅ Ready: Node is healthy and accepting pods
- Green indicator: All systems operational
- Available for scheduling: Can accept new pods
- All checks passing: System health checks successful
-
⚠️ Warning: Node has issues but is functional
- Yellow indicator: Some issues detected
- Limited functionality: May have reduced capacity
- Monitoring required: Needs attention but operational
-
❌ Not Ready: Node is not available for scheduling
- Red indicator: Critical issues present
- No new pods: Cannot accept new workloads
- Investigation needed: Requires immediate attention
Health Status Details
- Status Tooltips: Hover for detailed status information
- Condition Explanations: Clear explanation of current state
- Time Information: When status last changed
- Impact Assessment: What the status means for workloads
Resource Usage Monitoring
CPU Usage Display
![Figure needed]
Screenshot of CPU usage display with progress bar and details
Information Shown:
- Current Usage: Cores and percentage (e.g., "2.1 cores (52.5%)")
- Visual Progress Bar: Color-coded utilization indicator
- Allocatable Capacity: Total available CPU cores
- Usage Trend: Indicators for increasing/decreasing usage
Color Coding:
- Green (0-70%): Healthy CPU utilization
- Orange (70-85%): High utilization, monitor closely
- Red (85%+): Critical utilization, action needed
Memory Usage Display
![Figure needed]
Screenshot of memory usage with detailed breakdown
Information Shown:
- Current Usage: Memory and percentage (e.g., "6.2 GB (77.5%)")
- Visual Progress Bar: Same color scheme as CPU
- Available Memory: Total node memory capacity
- Memory Breakdown: Used vs. available memory
Memory Categories:
- Used: Currently allocated memory
- Available: Free memory for new allocations
- Reserved: System reserved memory
- Total: Complete node memory capacity
Pod Allocation Section
Pod Count Information
![Figure needed]
Screenshot of pod allocation display showing current vs capacity
Metrics Displayed:
- Running Pods: Current number of pods on the node
- Pod Capacity: Maximum pods the node can handle
- Utilization Percentage: Pod allocation as percentage of capacity
- Allocation Efficiency: How effectively pods are distributed
Load Indicators
- High Load: Node approaching pod capacity
- Medium Load: Balanced pod allocation
- Low Load: Node has significant pod capacity available
- Optimization: Indicators for rebalancing opportunities
System Information (Expandable)
Accessing Detailed Information
Click the expand button or section to view comprehensive system details:
![Figure needed]
Screenshot of expanded system information section
Software Information
- Kubernetes Version: Node's Kubernetes version (e.g., "v1.32.0")
- Container Runtime: Runtime version (e.g., "containerd 1.7.2")
- Kernel Version: Operating system kernel version
- OS Distribution: Linux distribution and version
Hardware Information
- Instance Details: vCloud instance type and specifications
- CPU Architecture: Processor architecture (x86_64, ARM, etc.)
- Memory Configuration: Total memory and allocation
- Storage Configuration: Local storage details
Location Information
- Region: Geographic region placement
- Availability Zone: Specific zone within region
- Resource Group: Associated vCloud resource group
- Network: Network configuration and connectivity
System Identifiers
- Hostname: System hostname
- Machine ID: Unique machine identifier
- Boot ID: Current boot session identifier
- System UUID: Hardware system UUID
Node Conditions Monitoring
Health Conditions
Detailed health condition indicators:
![Figure needed]
Screenshot of node conditions section showing various health checks
Condition Types
-
Memory Pressure: Available memory status
- False: Sufficient memory available
- True: Memory pressure detected, may affect scheduling
-
Disk Pressure: Available disk space status
- False: Adequate disk space
- True: Disk space low, may impact operations
-
PID Pressure: Process ID availability status
- False: Sufficient PIDs available
- True: Process limit approaching
-
Network Connectivity: Network health status
- Ready: Network connectivity functional
- Issues: Network problems detected
Condition Details
- Status: Current condition state (True/False/Unknown)
- Last Transition: When condition last changed
- Reason: Explanation for current state
- Message: Detailed information about condition
Real-time Monitoring Features
Data Refresh Behavior
- Automatic Updates: All node cards update every 30 seconds
- Synchronized Updates: All nodes update simultaneously
- Background Processing: Updates don't interrupt user interaction
- Status Consistency: Ensures consistent view across all nodes
Performance Indicators
- Resource Trends: Visual indicators for usage trends
- Capacity Warnings: Alerts for approaching limits
- Health Notifications: Clear indicators for health changes
- Load Distribution: Visual representation of cluster load balance
Using Node Health Information
Daily Operations
- Health Overview: Quick scan of all node health indicators
- Resource Check: Verify no nodes are resource-constrained
- Capacity Planning: Identify nodes approaching capacity limits
- Issue Detection: Spot problematic nodes requiring attention
Troubleshooting
- Problem Identification: Find nodes with health issues
- Resource Diagnosis: Identify resource pressure points
- Capacity Analysis: Understand resource utilization patterns
- Performance Investigation: Deep dive into node performance
Capacity Planning
- Usage Patterns: Analyze resource usage across nodes
- Load Distribution: Ensure even workload distribution
- Scaling Decisions: Identify when to add or remove nodes
- Optimization: Find opportunities for better resource utilization
Best Practices
Regular Monitoring
- Daily Checks: Review node health matrix daily
- Pattern Recognition: Notice trends in resource usage
- Proactive Management: Address issues before they become critical
- Documentation: Record significant observations
Performance Optimization
- Load Balancing: Ensure even distribution across nodes
- Resource Right-sizing: Match node specs to workload needs
- Capacity Management: Maintain appropriate headroom
- Health Maintenance: Keep all nodes in healthy state
Issue Response
- Quick Assessment: Use health matrix for rapid issue assessment
- Prioritization: Focus on critical health issues first
- Impact Analysis: Understand how node issues affect workloads
- Escalation: Know when to escalate to support
Next: Learn about Resource Analytics for pod distribution and namespace analysis.