Komodor case study

4 min read

Cover Image for Komodor case study

Kubernetes troubleshooting

By default, you communicate with the Kubernetes API using kubectl commands. You use these commands to create resources, objects, etc. These kubectl commands are also used for troubleshooting. This is where things can get difficult. Your Kubernetes pods can get several errors such asCreateContainerConfigError, ImagePullBackOff, CrashLoopBackOff, and Kubernetes Node Not Ready. Sometimes your pod status could be unknown as well. How do you fix this?

To fix this, we have commands such as kubectl describe [object-name] or kubectl logs [object-name]. Once you run these commands, you get a bunch of information about your object, and you will need to go through all the logs to find out what is wrong.

The challenges with troubleshooting

Now let's take a look at what are the problems with Kubernetes troubleshooting. The best way to find out the challenge is by trying to troubleshoot a faulty pod yourself. Let's use Killercoda's Kubernetes Playground to see how you can troubleshoot Kubernetes.

Now as you can see, we've created a single pod called kubeworld running a nginx image, and it's running properly as expected.

Now let's create another pod, but this time, we will give it an image called random-image. This is not a valid image file, hence it will give an error message.

As you can see, it is giving us an image pull error which means the Kubernetes API does not know where to get this image from. In this case, we know exactly what the error is. But let's say you created a custom image that is not uploaded to DockerHub or anywhere. And you define in your configuration file where to look for this image on your local environment and you are getting this image pull error. How will you figure out what's wrong?

This is where you would use kubectl describe [object-name] and kubectl logs [object-name]. Let's run a describe on our random-image pod.

We get this long list of files that tells us everything about the pod. If you look at the events you can see that it is telling us that this image called random-image does not exist on DockerHub. If we try to run `kubectl logs random image it won't work since the pod was never active and no logs were created.

Now imagine having to do this for around 100 pods. It will be difficult to do that. This is where Komodor can help to make troubleshooting easy.

What is Komodor

Komodor is a Kubernetes troubleshooting platform that turns hours of guesswork into actionable answers in just a few clicks.

For each K8s resource, Komodor automatically constructs a coherent view, including the relevant deploys, config changes, dependencies, metrics, and past incidents. Komodor seamlessly integrates and utilizes data from cloud providers, source controls, CI/CD pipelines, monitoring tools, and incident response platforms.

How does it simplify troubleshooting

Komodor can be easily installed using a simple helm command, and connected to your Kubernetes Cluster using an API key. Once you've connected your Clusters with Komodor, you can view all your objects on Komodor's web dashboard.

Now if you need to troubleshoot any of your Kubernetes Objects, you can just click on your object and get a nice pop-up with information about the resource. You also have a record of all events which have occurred in your cluster, which is displayed in a timeline. This way you can see what events have taken place in your clusters, and when.

Komodor also provides several integrations which are easy to install and can be useful depending on your requirements.

Why is it unique

What makes Komodor stand out from other troubleshooting tools is that it collects and maintains historical data for your Kubernetes resources and displays it in a pretty web interface unlike a tool such as K9S which tries to enhance the terminal experience.

It is a Kubernetes-native application that provides a multi-cluster as well as multi-cloud observability and serves as a single source of truth across all your clusters.

Some success stories

Komodor has had several success stories, some of which you can find on their website. One particular story we would like to highlight is how Komodor reduced MTTR by 70% for Lacework and saved Christmas.

In short, the company was running a highly complex k8s environment with dozens of clusters and their teams were struggling with some acute inefficiencies when it came to incident management.

Komodor platform has helped Lacework significantly reduce MTTR and improve overall troubleshooting efficiency with some of their features such as E2E visibility, Contextual insights, Improved efficiency, Opinionated monitoring, and Pinpointing systemic issues.

How did Komodor save Christmas? One of the senior SRE engineers got a notification about an issue, which was solved in under 10 minutes from the engineer's phone itself. For more details about this success story, check out the article.

Conclusion

Without Komodor, troubleshooting a Kubernetes resource is pretty difficult. It is still possible, but it's very time-consuming to read through all the logs via the terminal.

Komodor helps to troubleshoot resources quickly and efficiently without having to spend too much time trying to figure out what's the problem. It also integrates with a bunch of applications that can be useful.

Get involved