15 Advanced Kubernetes Interview Questions with Detailed Answers (2025 Edition)

Kubernetes has become the de facto standard for container orchestration, but mastering its complexities requires understanding edge cases and production scenarios that go beyond basic tutorials. Whether you’re preparing for a senior DevOps engineer interview or want to deepen your Kubernetes expertise, these advanced questions will test your knowledge of real-world scenarios.

Also worth reading

Why These Kubernetes Questions Matter

Senior Kubernetes engineers encounter these scenarios in production environments daily. Understanding these edge cases can mean the difference between smooth deployments and costly downtime. Let’s dive into 15 challenging questions that separate junior developers from seasoned professionals.

Question 1: InitContainer Failures with Never Restart Policy

Question: If you have a Pod with initContainers that fail, but the main container has restartPolicy: Never, what happens to the Pod status?

Answer: When an initContainer fails and the Pod has restartPolicy: Never, the Pod will remain in the Init:Error or Init:CrashLoopBackOff state permanently. The main container will never start because initContainers must complete successfully before the main containers can begin.

Key points:

InitContainers run sequentially and must succeed
With restartPolicy: Never, failed initContainers won’t restart
The Pod becomes permanently stuck in a failed init state
You’ll need to delete and recreate the Pod to resolve this

apiVersion: v1
kind: Pod
spec:
  restartPolicy: Never
  initContainers:
  - name: init-container
    image: busybox
    command: ['sh', '-c', 'exit 1']  # This will fail
  containers:
  - name: main-container
    image: nginx  # This will never start

Question 2: StatefulSet Pod Deletion and Renaming

Question: When using a StatefulSet with 3 replicas and you delete replica-1, will replica-2 and replica-3 be renamed to maintain sequential ordering?

Answer: No, Kubernetes does not rename existing StatefulSet Pods. If you delete myapp-1, only that specific Pod gets recreated with the same name. myapp-2 and myapp-3 retain their original names.

StatefulSet naming behavior:

Pod names are persistent and ordinal-based (myapp-0, myapp-1, myapp-2)
When a Pod is deleted, it’s recreated with the same name and ordinal
Existing Pods are never renamed to fill gaps
This maintains stable network identities and persistent storage associations

This is crucial for applications requiring stable network identities like databases or distributed systems.

Question 3: DaemonSet Scheduling on Tainted Master Nodes

Question: Can a DaemonSet Pod be scheduled on a master node that has NoSchedule taint without explicitly adding tolerations?

Answer: No, DaemonSet Pods cannot be scheduled on nodes with NoSchedule taints unless they have matching tolerations. However, there’s an important exception:

The DaemonSet controller automatically adds tolerations for:

node.kubernetes.io/not-ready
node.kubernetes.io/unreachable
node.kubernetes.io/disk-pressure
node.kubernetes.io/memory-pressure
node.kubernetes.io/pid-pressure
node.kubernetes.io/network-unavailable

For master nodes with node-role.kubernetes.io/master:NoSchedule taint, you must explicitly add:

spec:
  template:
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

Question 4: Deployment Updates During Rolling Updates

Question: If you update a Deployment’s image while a rolling update is in progress, will Kubernetes wait for the current rollout to complete or start a new one immediately?

Answer: Kubernetes immediately starts a new rollout, canceling the current one. This behavior is called “rollout interruption.”

What happens:

Current rolling update stops immediately
New ReplicaSet is created for the updated image
Previous ReplicaSet (from the interrupted rollout) begins scaling down
New ReplicaSet scales up according to the rolling update strategy

You can observe this with:

kubectl rollout status deployment/myapp
kubectl rollout history deployment/myapp

This can lead to more Pods than expected during the transition period, so monitor resource usage carefully.

Question 5: Pod Eviction Timing and Control

Question: When a node becomes NotReady, how long does it take for Pods to be evicted, and can this be controlled per Pod?

Answer: By default, Pods are evicted after 5 minutes (300 seconds) when a node becomes NotReady. This is controlled by the --pod-eviction-timeout flag on the kube-controller-manager.

Per-Pod control options:

Toleration with tolerationSeconds: Control how long a Pod tolerates node conditions
PodDisruptionBudgets: Limit how many Pods can be evicted simultaneously
Priority and preemption: Higher priority Pods evict lower priority ones first

Example toleration:

tolerations:
- key: "node.kubernetes.io/not-ready"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 60  # Evict after 60 seconds instead of 300

Question 6: Multiple Containers Sharing Localhost Ports

Question: Is it possible for a Pod to have multiple containers sharing the same port on localhost, and what happens if they try to bind simultaneously?

Answer: No, multiple containers in the same Pod cannot bind to the same port on localhost simultaneously. Since containers in a Pod share the same network namespace, they share the same IP address and port space.

What happens:

The first container successfully binds to the port
The second container gets a “port already in use” error
The failing container may crash or go into CrashLoopBackOff

Solutions:

Use different ports for each container
Use a sidecar proxy pattern
Configure one container as the primary port handler

# This will cause conflicts
containers:
- name: app1
  ports:
  - containerPort: 8080
- name: app2
  ports:
  - containerPort: 8080  # Conflict!

Question 7: ReadWriteOnce PVC Multi-Pod Access

Question: If you create a PVC with ReadWriteOnce access mode, can multiple Pods on the same node access it simultaneously?

Answer: This depends on the storage provider and how it implements ReadWriteOnce (RWO).

Technical details:

RWO specification: Volume can be mounted as read-write by a single node
Implementation varies: Some storage providers allow multiple Pods on the same node to access RWO volumes
Not guaranteed: This behavior is not guaranteed by the Kubernetes specification

Safe approaches:

Use ReadWriteMany (RWX) for multi-Pod access
Use StatefulSets for predictable single-Pod-per-volume relationships
Test your specific storage provider’s behavior

# Safer approach for multi-Pod access
accessModes:
- ReadWriteMany  # Instead of ReadWriteOnce

Question 8: HPA Behavior with Unavailable Metrics Server

Question: When using Horizontal Pod Autoscaler with custom metrics, what happens if the metrics server becomes unavailable during high load?

Answer: When the metrics server becomes unavailable, HPA enters a degraded state:

Behavior during metrics unavailability:

HPA stops making scaling decisions
Current replica count is maintained
No scale-up occurs even during high load
Events show “unable to get metrics” errors

Recovery behavior:

Once metrics are available again, HPA resumes normal operation
It may trigger rapid scaling based on accumulated load
Consider using multiple metrics sources for redundancy

Monitoring considerations:

kubectl get hpa
kubectl describe hpa myapp-hpa

Best practices:

Monitor metrics server health
Set up alerts for HPA failures
Consider backup scaling strategies (manual intervention procedures)

Question 9: Port-Forward to CrashLoopBackOff Pods

Question: Can you run kubectl port-forward to a Pod that’s in CrashLoopBackOff state, and will it work?

Answer: It depends on the timing and Pod restart behavior:

During container restart interval: kubectl port-forward may work briefly if you catch the Pod between restarts and the container is temporarily running.

When container is down: Port-forward fails immediately with connection errors.

Practical approach:

# This usually fails
kubectl port-forward pod/failing-pod 8080:8080

# Better approach - port-forward to a service
kubectl port-forward service/myapp-service 8080:8080

For debugging CrashLoopBackOff:

Use kubectl logs pod-name --previous to see crash logs
Check container startup probes and resource limits
Consider temporarily removing liveness probes for debugging

Question 10: ServiceAccount Deletion Impact

Question: If a ServiceAccount is deleted while Pods using it are still running, what happens to the mounted tokens and API access?

Answer: Existing Pods continue to function with their mounted tokens, but with important limitations:

Immediate effects:

Running Pods: Continue using cached/mounted tokens until Pod restart
Token refresh: May fail when tokens expire (typically 1 hour)
New Pods: Cannot be created using the deleted ServiceAccount

Token behavior:

Mounted tokens remain valid until expiration
Kubernetes doesn’t immediately revoke tokens from running Pods
Applications may experience authentication failures when tokens expire

Recovery steps:

# Recreate the ServiceAccount
kubectl create serviceaccount myapp-sa

# Restart Pods to get new tokens
kubectl rollout restart deployment/myapp

Question 11: Anti-Affinity Scheduling Deadlocks

Question: When using anti-affinity rules, is it possible to create a “deadlock” where no new Pods can be scheduled?

Answer: Yes, overly restrictive anti-affinity rules can create scheduling deadlocks:

Common deadlock scenarios:

RequiredDuringSchedulingIgnoredDuringExecution with insufficient nodes
Zone anti-affinity with limited availability zones
Combination of multiple affinity rules creating impossible constraints

Example deadlock:

# If you have only 2 nodes and request 3 Pods with this rule
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: myapp
      topologyKey: kubernetes.io/hostname

Solutions:

Use preferredDuringSchedulingIgnoredDuringExecution instead of required
Ensure adequate node diversity
Monitor Pod scheduling events

Question 12: Job Failure Handling with Parallelism

Question: If you have a Job with parallelism: 3 and one Pod fails with restartPolicy: Never, will the Job create a replacement Pod?

Answer: Yes, the Job controller will create a replacement Pod to maintain the desired parallelism level.

Job behavior with failures:

restartPolicy: Never: Failed Pods are not restarted, but new Pods are created
Parallelism maintenance: Job ensures the specified number of Pods are running
Completion tracking: Job tracks successful completions vs. failures

Example configuration:

spec:
  parallelism: 3
  completions: 10
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: worker
        image: busybox

The Job will continue creating new Pods until it reaches the completion count or hits the backoff limit.

Question 13: Resource Requests vs. Limits During OOM

Question: Can a Pod’s resource requests be modified after creation, and what’s the difference between requests and limits during OOM scenarios?

Answer: Resource modification: Resource requests and limits cannot be modified after Pod creation. You must recreate the Pod or use VPA (Vertical Pod Autoscaler) for automatic adjustments.

OOM behavior differences:

Requests: Used for scheduling decisions, guaranteed resources
Limits: Maximum resources allowed, enforced by the kernel

During OOM scenarios:

Container exceeds limits: Container is immediately killed (OOMKilled)
Node memory pressure: Pods exceeding requests are candidates for eviction
Priority-based eviction: Lower priority Pods are evicted first

resources:
  requests:
    memory: "64Mi"     # Guaranteed
    cpu: "250m"
  limits:
    memory: "128Mi"    # Maximum allowed
    cpu: "500m"

Question 14: Network Policy Default Egress Behavior

Question: When using network policies, if you don’t specify egress rules, are outbound connections blocked by default?

Answer: Yes, when you create a NetworkPolicy that selects Pods but doesn’t include egress rules, all outbound traffic from those Pods is blocked by default.

NetworkPolicy behavior:

No NetworkPolicy: All traffic allowed (default)
NetworkPolicy with only ingress: Egress remains open
NetworkPolicy without egress section: All egress blocked
Empty egress array: All egress blocked

Example blocking all egress:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-egress
spec:
  podSelector:
    matchLabels:
      app: secure-app
  policyTypes:
  - Egress
  # No egress rules = deny all egress

Question 15: PV Corruption Cross-Namespace Impact

Question: If a Persistent Volume gets corrupted, can multiple PVCs bound to it cause cascading failures across different namespaces?

Answer: Yes, if multiple PVCs from different namespaces are bound to the same corrupted PV, it can cause cascading failures.

Scenarios for cross-namespace impact:

Shared storage backend: Multiple PVs on same underlying storage
ReadWriteMany volumes: Multiple PVCs accessing same PV
Storage class dependencies: Shared storage infrastructure

Cascading failure patterns:

Data corruption spreads: Applications in multiple namespaces fail
Storage backend overload: Performance degradation affects all PVs
Backup system failures: Corrupt data propagates to backups

Prevention strategies:

# Use namespace-specific storage classes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: namespace-a-storage
parameters:
  zone: us-west1-a
  type: pd-ssd

Implement proper backup and disaster recovery
Use separate storage backends for critical namespaces
Monitor storage health across all namespaces

Conclusion: Mastering Advanced Kubernetes Concepts

These 15 advanced Kubernetes questions represent real-world scenarios that senior engineers encounter in production environments. Understanding these edge cases and failure modes is crucial for:

Production reliability: Preventing cascading failures and downtime
Efficient troubleshooting: Quickly identifying root causes
System design: Making informed architectural decisions
Career advancement: Demonstrating senior-level Kubernetes expertise

Continue practicing with these scenarios in test environments, and you’ll be well-prepared for both advanced Kubernetes interviews and production challenges.

Key takeaways:

Always test edge cases in non-production environments
Monitor and alert on unusual Pod states and resource usage
Understand the implications of every configuration choice
Keep disaster recovery and failure scenarios in mind when designing systems

For more advanced Kubernetes content and DevOps insights, bookmark this guide and continue expanding your container orchestration expertise.