NodeGroup Troubleshooting

Common NodeGroup issues and step-by-step solutions.

NodeGroup Creation Issues

Creation Button Disabled

Symptoms: "Create NodeGroup" button grayed out or unresponsive

Causes:

Cluster not in active state
NodeGroup creation not available post-deployment
Insufficient permissions
vCloud quota limitations

Solutions:

Verify cluster is "Active"
Contact support for NodeGroup creation assistance
Consider cluster recreation with desired NodeGroups

NodeGroup Creation Failures

Symptoms: NodeGroup stuck in "Creating" status or creation timeouts

Solutions:

Allow 10-15 minutes for creation
Check vCloud resource quotas
Verify network and security settings
Try different instance types or zones
Contact support for persistent failures

NodeGroup Scaling Issues

Scaling Operations Not Working

Symptoms: NodeGroup stuck in "Scaling" status or desired count not reached

Scale-Up Issues:

Check vCloud compute quota
Verify availability zone capacity
Try different instance types
Scale in smaller increments

Scale-Down Issues:

Manually drain nodes first
Check pod disruption budgets
Ensure storage can be detached

Unexpected Node Counts

Symptoms: Actual nodes don't match desired count

Solutions:

Allow time for automatic reconciliation
Check individual node health in Kubernetes
Contact support for persistent discrepancies

NodeGroup Deletion Issues

Delete Button Disabled

Protection Analysis:

Last NodeGroup Check: Is this the last remaining NodeGroup?
Master NodeGroup Check: Is this a master/control-plane NodeGroup?
Status Check: Is NodeGroup status "Ready"?
Role Check: Is this a worker NodeGroup?

Solutions by Protection Type:

Last NodeGroup: Create additional NodeGroup before deletion
Master NodeGroup: Master NodeGroups cannot be deleted (permanent protection)
Status: Wait for status to become "Ready"

Deletion Process Failures

Symptoms: Deletion starts but fails to complete

Solutions:

Manually remove dependencies (pods, volumes, load balancers)
Force pod evacuation if needed
Contact support for stuck deletions

NodeGroup Status Issues

Status Not Updating

Symptoms: NodeGroup status appears outdated or inconsistent

Solutions:

Use refresh button or reload page
Clear browser cache and cookies
Allow 30-60 seconds for status propagation
Try different browser or incognito mode

Stuck Status Conditions

Common Stuck States:

Creating: Allow 15-20 minutes, contact support if stuck
Scaling: Allow 10-15 minutes, check resource availability
Error: Check error details, may require NodeGroup recreation

Performance Issues

Poor NodeGroup Performance

Analysis Areas:

Resource utilization (CPU/memory usage)
Instance types and hardware selection
Network latency and throughput
Storage performance characteristics

Solutions:

Upgrade to higher performance instance types
Distribute workloads across multiple NodeGroups
Optimize pod resource requests and limits
Use anti-affinity to spread workloads

Getting Support

Information to Gather

NodeGroup Information:

NodeGroup name and ID
Cluster name and ID
Current status and error messages
NodeGroup configuration details

Diagnostic Commands (if kubectl access available):

# Check node status
kubectl get nodes -o wide

# Check pod distribution
kubectl get pods -A -o wide

# Check resource usage
kubectl top nodes

# Check recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

Support Escalation

Level 1: Self-service (documentation, basic solutions)
Level 2: Support ticket with detailed information
Level 3: Emergency escalation for production-critical issues

NodeGroup Creation Issues​

Creation Button Disabled​

NodeGroup Creation Failures​

NodeGroup Scaling Issues​

Scaling Operations Not Working​

Unexpected Node Counts​

NodeGroup Deletion Issues​

Delete Button Disabled​

Deletion Process Failures​

NodeGroup Status Issues​

Status Not Updating​

Stuck Status Conditions​

Performance Issues​

Poor NodeGroup Performance​

Getting Support​

Information to Gather​

Support Escalation​

NodeGroup Creation Issues

Creation Button Disabled

NodeGroup Creation Failures

NodeGroup Scaling Issues

Scaling Operations Not Working

Unexpected Node Counts

NodeGroup Deletion Issues

Delete Button Disabled

Deletion Process Failures

NodeGroup Status Issues

Status Not Updating

Stuck Status Conditions

Performance Issues

Poor NodeGroup Performance

Getting Support

Information to Gather

Support Escalation