• tl;dr sec
  • Posts
  • What's Running In Your Cluster?

What's Running In Your Cluster?

On this page: We are reviewing the applications and services that represent the workloads and plugins that your cluster uses.

One of the most important things you’re going to need to do is understand what exactly is running in your clusters.

You now have a list of Deployments, but what are those Deployments doing? You see that there are services exposed to the Internet but what are those services? For each, you minimally want to determine:

  • What is the application?

  • Who or what business unit owns it?

  • Where is the source code?

  • How did it get there?

There are a lot of different tools to help you visualize what’s running in your cluster. I’m a big fan of Octant because it runs locally with minimal effect on the cluster itself.

I’ve seen a lot of Kubernetes clusters, and there were very few that only ran Kubernetes code by itself. Most used third party plugins and various integrations that provide additional features.

Besides the application workloads deployed into the cluster, you’re going to need to understand the plugins and services deployed that support the cluster.

Here are a few examples to think about.

Service Mesh and Workload Identity

A service mesh is designed to facilitate network communications between services. Think Pod-to-Pod networking.

Istio for example has become an extremely popular service mesh that integrates with Kubernetes to provide mutual TLS encrypted network communications between services… sometimes.

Make sure that you remember that while services like Istio provide security related services like transport security, they often aren’t relied upon as a primary security control. Their job is to connect things, not to prevent things from connecting. Istio, for example, sometimes blurs this line.

To understand the risk of the service mesh, you have to identify what it aims to provide. Is it relied upon to prevent something else from accessing the services? If so, this may increase the exploitability.

CNI/Networking Controls

Your network expectations within the cluster are going to be a very personal choice depending on the threat model you build. Most clusters have a network plugin called a CNI that handles networking controls in and out of the cluster.

Calico, Cilium, and Weave are all examples of CNI’s at work within a Kubernetes cluster – and they all work differently.

For example, if you know anything about BGP (or the inherent security challenges therein), how do you feel about it being used to facilitate network communications within the clusters and across nodes?

Or what about the latest buzzword, eBPF – did you know that in some cases when a ring buffer is full, it simply drops what it was doing to those packets? If you were using eBPF to monitor the syscalls that hit the kernel, you may be dropping an attacker’s activity as it compromises your host. Everything has nuance and subtle security controls.

The most common issue that, in my experience, you’re going to want to keep track of is whether a CNI has been enabled at all. Because without it, you aren’t going to be able to do any in-cluster networking controls and this may affect the overall risk. How do you know if a CNI is enabled at all? There is no universal object to query but you can take a look at the objects that are running in the kube-system and hope to find something with an appropriate name.

Here’s how to hunt through all the CNI’s I’ve heard of:

> kubectl get all -n kube-system | egrep -i "ACI|Calico|Canal|Cilium|CNI-Genie|Contiv|Contrail|Flannel|Knitter|Multus|OVN-Kubernetes|OVN4NFV-K8S-Plugin|NSX-T|Nuage|Romana|Weave"
pod/calico-kube-controllers-55ffdb7658-mljpb   1/1     Running   1          5m41s
pod/calico-node-cd4mg                          1/1     Running   0          5m41s
daemonset.apps/calico-node   1         1         1       1            1           kubernetes.io/os=linux   5m52s
deployment.apps/calico-kube-controllers   1/1     1            1           5m52s
replicaset.apps/calico-kube-controllers-55ffdb7658   1         1         1       5m41s

Access Control Plugins and OIDC Integrations

Kubernetes supports a built-in access control system that uses certificates or tokens. It has native OIDC support, which will allow it to integrate with Single Sign-On providers or cloud IAM controls.

If you’ve already audited who has access to the cluster, the goal here is to assess the access controls themselves.

If it’s using Kubernetes authorization controls, cluster-admins will be able to add other users but if it’s using IAM, you’ll need to review the cloud’s security controls in place.

At a minimum, make sure you can speak to how the clusters provide authentication and authorization.

One of my favorite things to explain to people is that most Kubernetes clusters, do not have a good process for certificate revocation. Usually authentication is, stateless -- you provide it a token, the server validates whether the token has been signed. And if you know anything about stateless, you may know that ones of its weaknesses is that it has no builtin revocation mechanism.

This means that if you've granted a user access to the cluster, they will be authorized to log in for the lifetime of the certificate authority. In many (most) clusters, that default value is 10 years. That intern you hired 8 years ago may still be able to access your cluster and there's nothing you can do about it without rebuilding the cluster.

Here’s a few ways to try and hunt who has had, or currently has access to the cluster:

Look for logged events in the API:

> kubectl logs -n kube-system kube-api-server

Review the ClusterRoleBindings and RoleBindings to see what subjects are applied:

> kubectl get clusterrolebindings.rbac.authorization.k8s.io cluster-admin -o yaml | kubectl-neat
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: cluster-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:masters

Here’s the problem. There really are no “users” or “groups” in Kubernetes. The “kind: Group” that you see above, actually refers to the “Organization” field of a certificate… that was generated by a user… that you don’t have access to.

The best advice here is rebuild your cluster if you don’t know who has access and make sure to leverage some kind of OIDC integration going forward.

Policy Tools and Admission Controllers

A strong security policy is always going to win out over adhoc security tools, plugins, or analyzers.

This is why Kubernetes has built an API that allows owners to restrict the types of Kubernetes objects that are created in the cluster.

Here’s a simple example: Many admins believe that Kubernetes can handle multiple users/tenants inside the cluster. But tenants will often have full control to bypass any of the security features of a container, and therefore take control of an entire Node if they wanted to.

For example, in my Shmoocon Talk on Kubernetes security, I demonstrated a variety of attacks on the misconfigurations of clusters… Not zero days or leet sploitz, but realistic attack vectors caused by a lack of policy controls.

How do you prevent this?

Kubernetes says: Admission controllers. They are a critical control for maintaining sanity of the objects being deployed into the cluster. At a basic level, an admission controller can block objects that don’t adhere to a policy or modify objects so that they do.

If aren’t sure what admission controllers you have running, the tool view-webhook that’s part of Krew can help you analyze the webhooks that you’re currently configured to use:

> kubectl-view_webhook