CommunityDevOps ExchangePartners
Articles
12/14/2021
10 minutes

5 Tips for Troubleshooting Kubernetes Deployments

Written by
Copado Team
Table of contents

As DevOps teams embrace containerization to support the speed and flexibility needed for CI/CD (continuous integration/continuous delivery) implementations, Kubernetes has quickly become the most popular container orchestration platform. In Kubernetes, individual containers are packaged together as pods, which you can automatically deploy, scale, and manage across entire clusters.

When we hear the term “automatic,” we tend to think “easy,” but Kubernetes disproves this notion. There are numerous errors you could run into while deploying a Kubernetes pod that could cause it to fail. Let’s discuss 5 tips for troubleshooting Kubernetes deployments.

Troubleshooting Kubernetes Deployments

So, you’ve deployed a Kubernetes pod, but how do you know if it’s up and running? The first step for troubleshooting Kubernetes deployments is to proactively verify deployed pods are in a ‘ready’ state. To do this, run the command:

                kubectl get pods

This command will show you all running pods. The output will tell you if the pod is ready, its current status, how many times it has restarted, and the age of the pod's status (i.e. the length of time it’s been in its current state). If your pod isn’t running, then the STATUS column will likely list an error message.

If you need to troubleshoot a specific pod, you should run:

                kubectl get pod [podname]

However, keep in mind that if your pod is part of a Kubernetes deployment, its name will have the replica set and pod IDs appended. You can learn this information from the “get pods” command or, if you only want a list of the replica sets, run:

                kubectl get rs

Here are five of the most common Kubernetes pod deployment error messages, along with tips for troubleshooting and fixing the underlying issues that cause them.

Kubernetes Deployment Error #1: CreateContainerConfigError

If the status of your pod is CreateContainerConfigError, that usually means Kubernetes couldn’t find the Secret or ConfigMap. A Secret is a Kubernetes object that stores sensitive information the pod needs to authenticate, like database credentials. A ConfigMap is pretty much what it sounds like—a “map” containing the pod’s configuration information. If your new pod can’t mount the Secrets or ConfigMaps specified, it won’t be able to deploy.

To find out what’s missing, run the command:

                kubectl describe pod [name]  

Where [name] is the name of your pod without the brackets.

If a Secret or ConfigMap is missing, you’ll receive an error that looks like this:

                Error: configmap “configmap-1” not found

Or

                Error: secret “secret-1” not found

This tells you which ConfigMap or Secret you need to find or replace. To see if the missing Secret or ConfigMap exists in your cluster, you can run a version of this command (substituting the correct info as needed):

                kubectl get configmap configmap-1

If the result is “null”, then it means the ConfigMap or Secret is missing. Then you either need to create it or modify the name of the ConfigMap that your pod requests during deployment.

Bonus tip: make sure you double-check the name and spelling of the missing ConfigMap or Secret in your error message. Sometimes the CreateContainerConfigError is caused by a simple typo in the name!

Kubernetes Deployment Error #2: CrashLoopBackOff

Another common Kubernetes deployment error is CrashLoopBackoff. This error could, unfortunately, mean quite a few things, including an issue with mounting the volume or insufficient resources on the node. Typically, the pod will stay in a “pending” state until the cluster has the resource available to schedule it. If you get the CrashLoopBackOff error, you’ll need to troubleshoot your Kubernetes deployment by running through a list of common causes and eliminating them one by one.  

To narrow down the possible causes of your CrashLoopBackOff error, you can look in the pod details for clues. Run the command:

                kubectl describe pod [name]  

If you get a “Liveness probe failed” and “Back-off restarting failed container” message, that can mean there aren’t enough resources available. While you may be tempted to fix this problem by adjusting “periodSeconds” or “timeoutSeconds” to give your application a longer window of time to respond before shutting down, doing so could mask legitimate performance issues. You should check the configurations of the liveness probe to see what triggered the failure and whether the probe is properly configured for your use case.

If you don’t get the “Liveness probe failed” output from your pod details, then you’ll need to keep digging. The pod details may contain more clues—under the “Last State: Terminated” section, check the Reason, Message, and Exit Code to see if there’s any helpful information. For example, if the Reason is “OOMkilled,” then the pod was trying to use more memory than the node had. To resolve this, you would have to either adjust or specify resources for this pod.

The next step is to check the logs from your previous container instance to see if there are any clues in there. After running kubectl get pod, you can use this command to see the last ten lines from the pod’s log:

                kubectl logs --previous --tail 10

You can also check the deployment logs by running:

                kubectl logs -f deploy/ -n

The deployment logs will tell you about issues at the application level, such as volume mounting errors.

Kubernetes Deployment Error #3: ImagePullBackOff or ErrImagePull

If you see the ImagePullBackOff or ErrImagePull status, that means your pod couldn’t pull its container image from the registry. The underlying cause could be an incorrect image name or tag, or an authentication issue due to a problem with the Secret. To determine which it is, run the command:

                kubectl describe pod [name]

You can expect to see one of these outputs:

  • wrong image name or tag indicates you likely typed in the wrong name or tag in the pod manifest. To fix the issue, you can confirm the correct image name by using the “docker pull” command and correct it in the pod manifest.
  • authentication issue in Container registry indicates the pod either has an issue with the Secret that holds its credentials or the pod doesn’t have the right permissions to perform the operation. Correcting that problem should get rid of the ImagePullBackOff or ErrImagePull error.

Kubernetes Deployment Error #4: Unknown

If you see the status Unknown, that usually means the worker node has shut down or crashed, so all stateful pods that reside on the node are unavailable. After five minutes (by default), Kubernetes will change the status of all pods scheduled on that node to Unknown and will attempt to schedule those pods on a different node.

If this is your issue, when you run the “kubectl get pod [name]” command, you’ll see the affected pod listed twice. One will have the status Unknown. The other will have the status ContainerCreating, indicating that Kubernetes is attempting to schedule your pod on a working node.

In that output, you can check the NODE column to see the name of both the crashed node and the working node. Using the name of the crashed node, run the command:

                kubectl get node [name]

This output will likely indicate that the node is in NotReady status. If that’s the case, then the issue will probably resolve itself when the failed node is able to recover and rejoin the cluster. The original pod with the Unknown status will be deleted. The second pod that was automatically scheduled on a different node will finish deploying.

If that process is taking too long, or if the node doesn’t automatically recover, then you can manually delete the failed node. This will allow Kubernetes to finish deploying the new pod on the successful node. Simply run the command:

                kubectl delete node [name]

Kubernetes Deployment Error #5: No resources found

When you run your kubectl get pod [name] command, you might get an output that just says:

                No resources found.

That means your new pod deployment exceeded the CPU or memory limits set by your cluster administrator. You can verify those limits using the command:

                kubectl describe limitrange

If this is the issue affecting your Kubernetes pod deployment, you have two options: ask your cluster administrator to increase the limit or reduce the requested resources to something below the existing limit.

Bonus tip: Make sure you have specified the correct namespace. If you don’t, kubectl get pod [name] will only show you what’s in the default namespace.

More Help Troubleshooting Kubernetes Deployments

These are the issues you’re most likely to run into when you deploy your Kubernetes pods, but it’s far from an exhaustive list. If you can’t find the cause of your problem or if you need more extensive Kubernetes troubleshooting, then you shouldn’t be afraid to ask the experts for help. Copado’s team of DevOps experts is here to help you achieve digital transformation through containerization, Kubernetes orchestration, cloud-native architecture development, and more.

 

Book a demo

About The Author

#1 DevOps Platform for Salesforce

We Build Unstoppable Teams By Equipping DevOps Professionals With The Platform, Tools And Training They Need To Make Release Days Obsolete. Work Smarter, Not Longer.

Enhancing Customer Service with CopadoGPT Technology
What is Efficient Low Code Deployment?
Copado Launches Test Copilot to Deliver AI-powered Rapid Test Creation
Cloud-Native Testing Automation: A Comprehensive Guide
A Guide to Effective Change Management in Salesforce for DevOps Teams
Building a Scalable Governance Framework for Sustainable Value
Copado Launches Copado Explorer to Simplify and Streamline Testing on Salesforce
Exploring Top Cloud Automation Testing Tools
Master Salesforce DevOps with Copado Robotic Testing
Exploratory Testing vs. Automated Testing: Finding the Right Balance
A Guide to Salesforce Source Control
A Guide to DevOps Branching Strategies
Family Time vs. Mobile App Release Days: Can Test Automation Help Us Have Both?
How to Resolve Salesforce Merge Conflicts: A Guide
Copado Expands Beta Access to CopadoGPT for All Customers, Revolutionizing SaaS DevOps with AI
Is Mobile Test Automation Unnecessarily Hard? A Guide to Simplify Mobile Test Automation
From Silos to Streamlined Development: Tarun’s Tale of DevOps Success
Simplified Scaling: 10 Ways to Grow Your Salesforce Development Practice
What is Salesforce Incident Management?
What Is Automated Salesforce Testing? Choosing the Right Automation Tool for Salesforce
Copado Appoints Seasoned Sales Executive Bob Grewal to Chief Revenue Officer
Business Benefits of DevOps: A Guide
Copado Brings Generative AI to Its DevOps Platform to Improve Software Development for Enterprise SaaS
Celebrating 10 Years of Copado: A Decade of DevOps Evolution and Growth
Copado Celebrates 10 Years of DevOps for Enterprise SaaS Solutions
5 Reasons Why Copado = Less Divorces for Developers
What is DevOps? Build a Successful DevOps Ecosystem with Copado’s Best Practices
Scaling App Development While Meeting Security Standards
5 Data Deploy Features You Don’t Want to Miss
Top 5 Reasons I Choose Copado for Salesforce Development
How to Elevate Customer Experiences with Automated Testing
Getting Started With Value Stream Maps
Copado and nCino Partner to Provide Proven DevOps Tools for Financial Institutions
Unlocking Success with Copado: Mission-Critical Tools for Developers
How Automated Testing Enables DevOps Efficiency
How to Keep Salesforce Sandboxes in Sync
How to Switch from Manual to Automated Testing with Robotic Testing
Best Practices to Prevent Merge Conflicts with Copado 1 Platform
Software Bugs: The Three Causes of Programming Errors
How Does Copado Solve Release Readiness Roadblocks?
Why I Choose Copado Robotic Testing for my Test Automation
How to schedule a Function and Job Template in DevOps: A Step-by-Step Guide
Delivering Quality nCino Experiences with Automated Deployments and Testing
Best Practices Matter for Accelerated Salesforce Release Management
Maximize Your Code Quality, Security and performance with Copado Salesforce Code Analyzer
Upgrade Your Test Automation Game: The Benefits of Switching from Selenium to a More Advanced Platform
Three Takeaways From Copa Community Day
Cloud Native Applications: 5 Characteristics to Look for in the Right Tools
Using Salesforce nCino Architecture for Best Testing Results
How To Develop A Salesforce Testing Strategy For Your Enterprise
What Is Multi Cloud: Key Use Cases and Benefits for Enterprise Settings
5 Steps to Building a Salesforce Center of Excellence for Government Agencies
Salesforce UI testing: Benefits to Staying on Top of Updates
Benefits of UI Test Automation and Why You Should Care
Types of Salesforce Testing and When To Use Them
Copado + DataColada: Enabling CI/CD for Developers Across APAC
What is Salesforce API Testing and It Why Should Be Automated
Machine Learning Models: Adapting Data Patterns With Copado For AI Test Automation
Automated Testing Benefits: The Case For As Little Manual Testing As Possible
Beyond Selenium: Low Code Testing To Maximize Speed and Quality
UI Testing Best Practices: From Implementation to Automation
How Agile Test Automation Helps You Develop Better and Faster
Salesforce Test Cases: Knowing When to Test
DevOps Quality Assurance: Major Pitfalls and Challenges
11 Characteristics of Advanced Persistent Threats (APTs) That Set Them Apart
7 Key Compliance Regulations Relating to Data Storage
7 Ways Digital Transformation Consulting Revolutionizes Your Business
6 Top Cloud Security Trends
API Management Best Practices
Applying a Zero Trust Infrastructure in Kubernetes
Building a Data Pipeline Architecture Based on Best Practices Brings the Biggest Rewards
CI/CD Methodology vs. CI/CD Mentality: How to Meet Your Workflow Goals
DevOps to DevSecOps: How to Build Security into the Development Lifecycle
DevSecOps vs Agile: It’s Not Either/Or
How to Create a Digital Transformation Roadmap to Success
Infrastructure As Code: Overcome the Barriers to Effective Network Automation
Leveraging Compliance Automation Tools to Mitigate Risk
Moving Forward with These CI/CD Best Practices
Top 3 Data Compliance Challenges of Tomorrow and the Solutions You Need Today
Top 6 Cloud Security Management Policies and Procedures to Protect Your Business
What are the Benefits of Principle of Least Privilege (POLP) for My Organization?
You Can’t Measure What You Can’t See: Getting to know the 4 Metrics of Software Delivery Performance
How the Public Sector Can Continue to Accelerate Modernization