NodeGroup Troubleshooting
Common NodeGroup issues and step-by-step solutions.
NodeGroup Creation Issues
Creation Button Disabled
Symptoms: "Create NodeGroup" button grayed out or unresponsive
Causes:
- Cluster not in active state
- NodeGroup creation not available post-deployment
- Insufficient permissions
- vCloud quota limitations
Solutions:
- Verify cluster is "Active"
- Contact support for NodeGroup creation assistance
- Consider cluster recreation with desired NodeGroups
NodeGroup Creation Failures
Symptoms: NodeGroup stuck in "Creating" status or creation timeouts
Solutions:
- Allow 10-15 minutes for creation
- Check vCloud resource quotas
- Verify network and security settings
- Try different instance types or zones
- Contact support for persistent failures
NodeGroup Scaling Issues
Scaling Operations Not Working
Symptoms: NodeGroup stuck in "Scaling" status or desired count not reached
Scale-Up Issues:
- Check vCloud compute quota
- Verify availability zone capacity
- Try different instance types
- Scale in smaller increments
Scale-Down Issues:
- Manually drain nodes first
- Check pod disruption budgets
- Ensure storage can be detached
Unexpected Node Counts
Symptoms: Actual nodes don't match desired count
Solutions:
- Allow time for automatic reconciliation
- Check individual node health in Kubernetes
- Contact support for persistent discrepancies
NodeGroup Deletion Issues
Delete Button Disabled
Protection Analysis:
- Last NodeGroup Check: Is this the last remaining NodeGroup?
- Master NodeGroup Check: Is this a master/control-plane NodeGroup?
- Status Check: Is NodeGroup status "Ready"?
- Role Check: Is this a worker NodeGroup?
Solutions by Protection Type:
- Last NodeGroup: Create additional NodeGroup before deletion
- Master NodeGroup: Master NodeGroups cannot be deleted (permanent protection)
- Status: Wait for status to become "Ready"
Deletion Process Failures
Symptoms: Deletion starts but fails to complete
Solutions:
- Manually remove dependencies (pods, volumes, load balancers)
- Force pod evacuation if needed
- Contact support for stuck deletions
NodeGroup Status Issues
Status Not Updating
Symptoms: NodeGroup status appears outdated or inconsistent
Solutions:
- Use refresh button or reload page
- Clear browser cache and cookies
- Allow 30-60 seconds for status propagation
- Try different browser or incognito mode
Stuck Status Conditions
Common Stuck States:
- Creating: Allow 15-20 minutes, contact support if stuck
- Scaling: Allow 10-15 minutes, check resource availability
- Error: Check error details, may require NodeGroup recreation
Performance Issues
Poor NodeGroup Performance
Analysis Areas:
- Resource utilization (CPU/memory usage)
- Instance types and hardware selection
- Network latency and throughput
- Storage performance characteristics
Solutions:
- Upgrade to higher performance instance types
- Distribute workloads across multiple NodeGroups
- Optimize pod resource requests and limits
- Use anti-affinity to spread workloads
Getting Support
Information to Gather
NodeGroup Information:
- NodeGroup name and ID
- Cluster name and ID
- Current status and error messages
- NodeGroup configuration details
Diagnostic Commands (if kubectl access available):
# Check node status
kubectl get nodes -o wide
# Check pod distribution
kubectl get pods -A -o wide
# Check resource usage
kubectl top nodes
# Check recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
Support Escalation
- Level 1: Self-service (documentation, basic solutions)
- Level 2: Support ticket with detailed information
- Level 3: Emergency escalation for production-critical issues