Tag: Kubernetes security best practices

  • Kubernetes Security Best Practices by Ian Lewis

    Kubernetes Security Best Practices by Ian Lewis

    TL;DR: Kubernetes is powerful but inherently complex, and securing it requires a proactive, layered approach. From RBAC to Pod Security Standards, and tools like Falco and Prometheus, this guide covers production-tested strategies to harden your Kubernetes clusters. A security-first mindset isn’t optional—it’s a necessity for DevSecOps teams.

    Quick Answer: Kubernetes security hinges on principles like least privilege, network segmentation, and continuous monitoring. Implement RBAC, Pod Security Standards, and vulnerability scanning to safeguard your clusters.

    Introduction: Why Kubernetes Security Matters

    Imagine Kubernetes as the control tower of a bustling airport. It orchestrates the takeoff and landing of containers, ensuring everything runs smoothly. But what happens when the control tower itself is compromised? Chaos. Kubernetes has become the backbone of modern cloud-native applications, but its complexity introduces unique security challenges that can’t be ignored.

    With the rise of Kubernetes in production environments, attackers have shifted their focus to exploiting misconfigurations, unpatched vulnerabilities, and insecure defaults. For DevSecOps teams, securing Kubernetes isn’t just about ticking boxes—it’s about building a fortress capable of withstanding real-world threats. A security-first mindset is no longer optional; it’s foundational.

    Organizations adopting Kubernetes often face a steep learning curve when it comes to security. The platform’s flexibility and extensibility are double-edged swords: while they enable innovation, they also open doors to potential misconfigurations. For example, leaving the Kubernetes API server exposed to the internet without proper authentication can lead to catastrophic breaches. This underscores the importance of understanding and implementing security best practices from day one.

    Furthermore, the shared responsibility model in Kubernetes environments adds another layer of complexity. While cloud providers may secure the underlying infrastructure, the onus is on the user to secure workloads, configurations, and access controls. This article aims to equip you with the knowledge and tools to navigate these challenges effectively.

    Core Principles of Kubernetes Security

    Securing Kubernetes starts with understanding its core principles. These principles act as the bedrock for any security strategy, ensuring that your clusters are resilient against attacks.

    Least Privilege Access and Role-Based Access Control (RBAC)

    Think of RBAC as the bouncer at a nightclub. It ensures that only authorized individuals get access to specific areas. In Kubernetes, RBAC defines who can do what within the cluster. Misconfigured RBAC policies are a common attack vector, so it’s crucial to follow the principle of least privilege. Pairing RBAC with Pod Security Standards gives you defense in depth.

    For example, granting a service account cluster-admin privileges when it only needs read access to a specific namespace is a recipe for disaster. Instead, create granular roles tailored to specific use cases. Here’s a practical example:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      namespace: default
      name: pod-reader
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "list"]

    The above configuration creates a role that allows read-only access to pods. Pair this with a RoleBinding to assign it to a specific user or service account:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: read-pods-binding
      namespace: default
    subjects:
    - kind: User
      name: jane-doe
      apiGroup: rbac.authorization.k8s.io
    roleRef:
      kind: Role
      name: pod-reader
      apiGroup: rbac.authorization.k8s.io

    This RoleBinding ensures that the user jane-doe can only read pod information in the default namespace.

    💡 Pro Tip: Regularly audit your RBAC policies to ensure they align with the principle of least privilege. Use tools like RBAC Manager to simplify this process.

    Network Segmentation and Pod-to-Pod Communication Policies

    Network policies in Kubernetes are like building walls in an open-plan office. Without them, everyone can hear everything. By default, Kubernetes allows unrestricted communication between pods, which is a security nightmare. Implementing network policies ensures that pods can only communicate with authorized endpoints.

    For instance, consider a scenario where your application pods should only communicate with database pods. A network policy can enforce this restriction:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-app-traffic
      namespace: default
    spec:
      podSelector:
        matchLabels:
          app: my-app
      policyTypes:
      - Ingress
      ingress:
      - from:
        - podSelector:
            matchLabels:
              app: my-database

    This policy restricts ingress traffic to pods labeled app: my-app from pods labeled app: my-database. Without such policies, a compromised pod could potentially access sensitive resources.

    It’s also essential to test your network policies to ensure they work as intended. Tools like kubectl-tree can help visualize policy relationships, while Hubble provides real-time network flow monitoring.

    💡 Pro Tip: Start with a default deny-all policy and incrementally add rules to allow necessary traffic. This approach minimizes the attack surface.

    Securing the Kubernetes API Server and etcd

    The Kubernetes API server is the brain of the cluster, and etcd is its memory. Compromising either is catastrophic. Always enable authentication and encryption for API server communication. For etcd, use TLS encryption and restrict access to trusted IPs.

    For example, you can enable API server audit logging to monitor access attempts:

    apiVersion: audit.k8s.io/v1
    kind: Policy
    rules:
    - level: Metadata
      resources:
      - group: ""
        resources: ["pods"]

    This configuration logs metadata for all pod-related API requests, providing valuable insights into cluster activity.

    💡 Pro Tip: Use Kubernetes’ built-in encryption providers to encrypt sensitive data at rest in etcd. This adds an extra layer of security.

    Production-Tested Security Practices

    Beyond the core principles, there are specific practices that have been battle-tested in production environments. These practices address common vulnerabilities and ensure your cluster is ready for real-world challenges.

    Regular Vulnerability Scanning for Container Images

    Container images are often the weakest link in the security chain. Tools like Trivy, Grype, and Clair can scan images for known vulnerabilities. Integrate these tools into your CI/CD pipeline to catch issues early.

    # Scan an image with Grype
    grype my-app-image:latest

    Address any critical vulnerabilities before deploying the image to production.

    For example, if a scan reveals a critical vulnerability in a base image, consider switching to a minimal base image like distroless or Alpine. These images have smaller attack surfaces, reducing the likelihood of exploitation.

    💡 Pro Tip: Automate vulnerability scanning in your CI/CD pipeline and fail builds if critical issues are detected. This ensures vulnerabilities are addressed before deployment.

    Implementing Pod Security Standards (PSS) and Admission Controllers

    Pod Security Standards define baseline security requirements for pods. Use admission controllers like OPA Gatekeeper or Kyverno to enforce these standards.

    apiVersion: constraints.gatekeeper.sh/v1beta1
    kind: K8sPSPRestricted
    metadata:
      name: restrict-privileged-pods
    spec:
      match:
        kinds:
        - apiGroups: [""]
          kinds: ["Pod"]

    This constraint ensures that privileged pods are not allowed in the cluster.

    Admission controllers can also enforce other security policies, such as requiring image signing or disallowing containers from running as root. These measures significantly enhance cluster security.

    Monitoring and Incident Response

    Even the best security measures can fail. Monitoring and incident response are your safety nets, ensuring that you can detect and mitigate issues quickly.

    Setting Up Audit Logs and Monitoring Suspicious Activities

    Enable Kubernetes audit logs to track API server activities. Use tools like Fluentd or Elasticsearch to aggregate and analyze logs for anomalies.

    Leveraging Tools Like Falco and Prometheus

    Falco is a runtime security tool that detects suspicious behavior in your cluster. Pair it with Prometheus for metrics-based monitoring.

    💡 Pro Tip: Create custom Falco rules tailored to your application’s behavior to reduce noise from false positives.

    Creating an Incident Response Plan Tailored for Kubernetes

    Develop a Kubernetes-specific incident response plan. Include steps for isolating compromised pods, rolling back deployments, and restoring etcd backups.

    Future-Proofing Kubernetes Security

    Security is a moving target. As Kubernetes evolves, so do the threats. Future-proofing your security strategy ensures that you’re prepared for what’s next.

    Staying Updated with the Latest Kubernetes Releases and Patches

    Always run supported Kubernetes versions and apply patches promptly. Subscribe to security advisories from the Kubernetes Product Security Committee.

    Adopting Emerging Tools and Practices for DevSecOps

    Keep an eye on emerging tools like Chainguard for secure container images and Sigstore for image signing. These tools address gaps in the current security landscape.

    Fostering a Culture of Continuous Improvement in Security

    Security isn’t a one-time effort. Conduct regular security reviews, encourage knowledge sharing, and invest in training for your team.

    Frequently Asked Questions

    What is the most critical aspect of Kubernetes security?

    RBAC and network policies are foundational. Without them, your cluster is vulnerable to unauthorized access and lateral movement.

    How often should I scan container images?

    Scan images during every build in your CI/CD pipeline and periodically for images already in production.

    Can I rely on default Kubernetes settings for security?

    No. Default settings prioritize usability over security. Always customize configurations to meet your security requirements.

    What tools can help with Kubernetes runtime security?

    Tools like Falco, Sysdig, and Aqua Security provide runtime protection by monitoring and alerting on suspicious activities.

    🛠️ Recommended Resources:

    Tools and books mentioned in (or relevant to) this article:

    Conclusion: Building a Security-First Kubernetes Culture

    Kubernetes security is a journey, not a destination. By adopting a security-first mindset and implementing the practices outlined here, you can build resilient clusters capable of withstanding modern threats. Remember, security isn’t optional—it’s foundational.

    Here’s what to remember:

    • Always implement RBAC and network policies.
    • Scan container images regularly and address vulnerabilities.
    • Use tools like Falco and Prometheus for monitoring.
    • Stay updated with the latest Kubernetes releases and patches.

    Have questions or tips to share? Drop a comment or reach out on Twitter. Let’s make Kubernetes security a priority, together.

    References

    📋 Disclosure: Some links in this article are affiliate links. If you purchase through these links, I earn a small commission at no extra cost to you. I only recommend products I’ve personally used or thoroughly evaluated. This helps support orthogonal.info and keeps the content free.

  • Mastering Kubernetes Security: Network Policies &

    Mastering Kubernetes Security: Network Policies &

    Network policies are the single most impactful security control you can add to a Kubernetes cluster — and most clusters I audit don’t have a single one. After implementing network segmentation across enterprise clusters with hundreds of namespaces, I’ve developed a repeatable approach that works. Here’s the playbook I use.

    Introduction to Kubernetes Security Challenges

    📌 TL;DR: Explore production-proven strategies for securing Kubernetes with network policies and service mesh, focusing on a security-first approach to DevSecOps.
    🎯 Quick Answer
    Explore production-proven strategies for securing Kubernetes with network policies and service mesh, focusing on a security-first approach to DevSecOps.

    According to a recent CNCF survey, 67% of organizations now run Kubernetes in production, yet only 23% have implemented pod security standards. This statistic is both surprising and alarming, highlighting how many teams prioritize functionality over security in their Kubernetes environments.

    Kubernetes has become the backbone of modern infrastructure, enabling teams to deploy, scale, and manage applications with unprecedented ease. But with great power comes great responsibility—or in this case, great security risks. From misconfigured RBAC roles to overly permissive network policies, the attack surface of a Kubernetes cluster can quickly spiral out of control.

    If you’re like me, you’ve probably seen firsthand how a single misstep in Kubernetes security can lead to production incidents, data breaches, or worse. The good news? By adopting a security-first mindset and Using tools like network policies and service meshes, you can significantly reduce your cluster’s risk profile.

    One of the biggest challenges in Kubernetes security is the sheer complexity of the ecosystem. With dozens of moving parts—pods, nodes, namespaces, and external integrations—it’s easy to overlook critical vulnerabilities. For example, a pod running with excessive privileges or a namespace with unrestricted access can act as a gateway for attackers to compromise your entire cluster.

    Another challenge is the dynamic nature of Kubernetes environments. Applications are constantly being updated, scaled, and redeployed, which can introduce new security risks. Without robust monitoring and automated security checks, it’s nearly impossible to keep up with these changes and ensure your cluster remains secure.

    💡 Pro Tip: Regularly audit your Kubernetes configurations using tools like kube-bench and kube-hunter. These tools can help you identify misconfigurations and vulnerabilities before they become critical issues.

    Network Policies: Building a Secure Foundation

    🔍 Lesson learned: When I first deployed network policies in a production cluster, I locked out the monitoring stack — Prometheus couldn’t scrape metrics, Grafana dashboards went dark, and the on-call engineer thought the cluster was down. Always test with a canary namespace first, and explicitly allow your observability traffic before applying default-deny.

    Network policies are one of Kubernetes’ most underrated security features. They allow you to define how pods communicate with each other and with external services, effectively acting as a firewall within your cluster. Without network policies, every pod can talk to every other pod by default—a recipe for disaster in production.

    To implement network policies effectively, you need to start by understanding your application’s communication patterns. Which services need to talk to each other? Which ones should be isolated? Once you’ve mapped out these interactions, you can define network policies to enforce them.

    Here’s an example of a basic network policy that restricts ingress traffic to a pod:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
     name: allow-specific-ingress
     namespace: my-namespace
    spec:
     podSelector:
     matchLabels:
     app: my-app
     policyTypes:
     - Ingress
     ingress:
     - from:
     - podSelector:
     matchLabels:
     app: trusted-app
     ports:
     - protocol: TCP
     port: 8080
    

    This policy ensures that only pods labeled app: trusted-app can send traffic to my-app on port 8080. It’s a simple yet powerful way to enforce least privilege.

    However, network policies can become complex as your cluster grows. For example, managing policies across multiple namespaces or environments can lead to configuration drift. To address this, consider using tools like Calico or Cilium, which provide advanced network policy management features and integrations.

    Another common use case for network policies is restricting egress traffic. For instance, you might want to prevent certain pods from accessing external resources like the internet. Here’s an example of a policy that blocks all egress traffic:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
     name: deny-egress
     namespace: my-namespace
    spec:
     podSelector:
     matchLabels:
     app: my-app
     policyTypes:
     - Egress
     egress: []
    

    This deny-all egress policy ensures that the specified pods cannot initiate any outbound connections, adding an extra layer of security.

    💡 Pro Tip: Start with a default deny-all policy and explicitly allow traffic as needed. This forces you to think critically about what communication is truly necessary.

    Troubleshooting: If your network policies aren’t working as expected, check the network plugin you’re using. Not all plugins support network policies, and some may have limitations or require additional configuration.

    Service Mesh: Enhancing Security at Scale

    ⚠️ Tradeoff: A service mesh like Istio adds powerful security features (mTLS, traffic policies) but also adds significant operational complexity. Sidecar proxies consume memory and CPU on every pod. In resource-constrained clusters, I’ve seen the mesh overhead exceed 15% of total cluster resources. For smaller deployments, network policies alone may be the right call.

    While network policies are great for defining communication rules, they don’t address higher-level concerns like encryption, authentication, and observability. This is where service meshes come into play. A service mesh provides a layer of infrastructure for managing service-to-service communication, offering features like mutual TLS (mTLS), traffic encryption, and detailed telemetry.

    Popular service mesh solutions include Istio, Linkerd, and Consul. Each has its strengths, but Istio stands out for its strong security features. For example, Istio can automatically encrypt all traffic between services using mTLS, ensuring that sensitive data is protected even within your cluster.

    Here’s an example of enabling mTLS in Istio:

    apiVersion: security.istio.io/v1beta1
    kind: PeerAuthentication
    metadata:
     name: default
     namespace: istio-system
    spec:
     mtls:
     mode: STRICT
    

    This configuration enforces strict mTLS for all services in the istio-system namespace. It’s a simple yet effective way to enhance security across your cluster.

    In addition to mTLS, service meshes offer features like traffic shaping, retries, and circuit breaking. These capabilities can improve the resilience and performance of your applications while also enhancing security. For example, you can use Istio’s traffic policies to limit the rate of requests to a specific service, reducing the risk of denial-of-service attacks.

    Another advantage of service meshes is their observability features. Tools like Jaeger and Kiali integrate smoothly with service meshes, providing detailed insights into service-to-service communication. This can help you identify and troubleshoot security issues, such as unauthorized access or unexpected traffic patterns.

    ⚠️ Security Note: Don’t forget to rotate your service mesh certificates regularly. Expired certificates can lead to downtime and security vulnerabilities.

    Troubleshooting: If you’re experiencing issues with mTLS, check the Istio control plane logs for errors. Common problems include misconfigured certificates or incompatible protocol versions.

    Integrating Network Policies and Service Mesh for Maximum Security

    Network policies and service meshes are powerful on their own, but they truly shine when used together. Network policies provide coarse-grained control over communication, while service meshes offer fine-grained security features like encryption and authentication.

    To integrate both in a production environment, start by defining network policies to restrict pod communication. Then, layer on a service mesh to handle encryption and observability. This two-pronged approach ensures that your cluster is secure at both the network and application layers.

    Here’s a step-by-step guide:

    • Define network policies for all namespaces, starting with a deny-all default.
    • Deploy a service mesh like Istio and configure mTLS for all services.
    • Use the service mesh’s observability features to monitor traffic and identify anomalies.
    • Iteratively refine your policies and configurations based on real-world usage.

    One real-world example of this integration is securing a multi-tenant Kubernetes cluster. By using network policies to isolate tenants and a service mesh to encrypt traffic, you can achieve a high level of security without sacrificing performance or scalability.

    💡 Pro Tip: Test your configurations in a staging environment before deploying to production. This helps catch misconfigurations that could lead to downtime.

    Troubleshooting: If you’re seeing unexpected traffic patterns, use the service mesh’s observability tools to trace the source of the issue. This can help you identify misconfigured policies or unauthorized access attempts.

    Monitoring, Testing, and Continuous Improvement

    Securing Kubernetes is not a one-and-done task—it’s a continuous journey. Monitoring and testing are critical to maintaining a secure environment. Tools like Prometheus, Grafana, and Jaeger can help you track metrics and visualize traffic patterns, while security scanners like kube-bench and Trivy can identify vulnerabilities.

    Automating security testing in your CI/CD pipeline is another must. For example, you can use Trivy to scan container images for vulnerabilities before deploying them:

    trivy image --severity HIGH,CRITICAL my-app:latest

    Finally, make iterative improvements based on threat modeling and incident analysis. Every security incident is an opportunity to learn and refine your approach.

    Another critical aspect of continuous improvement is staying informed about the latest security trends and vulnerabilities. Subscribe to security mailing lists, follow Kubernetes release notes, and participate in community forums to stay ahead of emerging threats.

    💡 Pro Tip: Schedule regular security reviews to ensure your configurations and policies stay up-to-date with evolving threats.

    Troubleshooting: If your monitoring tools aren’t providing the insights you need, consider integrating additional plugins or custom dashboards. For example, you can use Grafana Loki for centralized log management and analysis.

    Securing Kubernetes RBAC and Secrets Management

    While network policies and service meshes address communication and encryption, securing Kubernetes also requires robust Role-Based Access Control (RBAC) and secrets management. Misconfigured RBAC roles can grant excessive permissions, while poorly managed secrets can expose sensitive data.

    Start by auditing your RBAC configurations. Use the principle of least privilege to ensure that users and service accounts only have the permissions they need. Here’s an example of a minimal RBAC role for a read-only user:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
     namespace: my-namespace
     name: read-only
    rules:
    - apiGroups: [""]
     resources: ["pods"]
     verbs: ["get", "list", "watch"]
    

    For secrets management, consider using tools like HashiCorp Vault or Kubernetes Secrets Store CSI Driver. These tools provide secure storage and access controls for sensitive data like API keys and database credentials.

    💡 Pro Tip: Rotate your secrets regularly and monitor access logs to detect unauthorized access attempts.
    🛠️ Recommended Resources:

    Tools and books mentioned in (or relevant to) this article:

    Conclusion: Security as a Continuous Journey

    This is the exact approach I use: start with default-deny network policies in every namespace, then layer on a service mesh when you need mTLS and fine-grained traffic control. Don’t skip network policies just because you plan to add a mesh later — they’re complementary, not redundant. Run kubectl get networkpolicies --all-namespaces right now. If it’s empty, that’s your first task.

    Here’s what to remember:

    • Network policies provide a strong foundation for secure communication.
    • Service meshes enhance security with features like mTLS and traffic encryption.
    • Integrating both ensures complete security at scale.
    • Continuous monitoring and testing are critical to staying ahead of threats.
    • RBAC and secrets management are equally important for a secure cluster.

    If you have a Kubernetes security horror story—or a success story—I’d love to hear it. Drop a comment or reach out on Twitter. Next week, we’ll dive into securing Kubernetes RBAC configurations—because permissions are just as important as policies.

    📚 Related Reading

    Frequently Asked Questions

    What is Mastering Kubernetes Security: Network Policies & about?

    Explore production-proven strategies for securing Kubernetes with network policies and service mesh, focusing on a security-first approach to DevSecOps. Introduction to Kubernetes Security Challenges

    Who should read this article about Mastering Kubernetes Security: Network Policies &?

    Anyone interested in learning about Mastering Kubernetes Security: Network Policies & and related topics will find this article useful.

    What are the key takeaways from Mastering Kubernetes Security: Network Policies &?

    This statistic is both surprising and alarming, highlighting how many teams prioritize functionality over security in their Kubernetes environments. Kubernetes has become the backbone of modern infras

    References

    1. Kubernetes Documentation — “Network Policies”
    2. Cloud Native Computing Foundation (CNCF) — “The State of Cloud Native Development Report”
    3. OWASP — “Kubernetes Security Cheat Sheet”
    4. NIST — “Application Container Security Guide (SP 800-190)”
    5. GitHub — “Kubernetes Network Policy Recipes”
    📋 Disclosure: Some links are affiliate links. If you purchase through these links, I earn a small commission at no extra cost to you. I only recommend products I’ve personally used or thoroughly evaluated. This helps support orthogonal.info and keeps the content free.
    Get daily AI-powered market intelligence. Join Alpha Signal — free market briefs, security alerts, and dev tool recommendations.

    Disclaimer: This article is for educational purposes. Always test security configurations in a staging environment before production deployment.

  • TeamPCP Supply Chain Attacks on Trivy, KICS & LiteLLM

    TeamPCP Supply Chain Attacks on Trivy, KICS & LiteLLM

    On March 17, 2026, the open-source security ecosystem experienced what I consider the most sophisticated supply chain attack since SolarWinds. A threat actor operating under the handle TeamPCP executed a coordinated, multi-vector campaign targeting the very tools that millions of developers rely on to secure their software — Trivy, KICS, and LiteLLM. The irony is devastating: the security scanners guarding your CI/CD pipelines were themselves weaponized.

    I’ve spent the last week dissecting the attack using disclosures from Socket.dev and Wiz.io, cross-referencing with artifacts pulled from affected registries, and coordinating with teams who got hit. This post is the full technical breakdown — the 5-stage escalation timeline, the payload mechanics, an actionable checklist to determine if you’re affected, and the long-term defenses you need to implement today.

    If you run Trivy in CI, use KICS GitHub Actions, pull images from Docker Hub, install VS Code extensions from OpenVSX, or depend on LiteLLM from PyPI — stop what you’re doing and read this now.

    The 5-Stage Attack Timeline

    📌 TL;DR: On March 17, 2026, the open-source security ecosystem experienced what I consider the most sophisticated supply chain attack since SolarWinds.
    🎯 Quick Answer: On March 17, 2026, the TeamPCP supply chain attack compromised Trivy, KICS, and LiteLLM—the most sophisticated supply chain attack since SolarWinds. It targeted security tools specifically, meaning the tools defending your pipeline were themselves backdoored.

    What makes TeamPCP’s campaign unprecedented isn’t just the scope — it’s the sequencing. Each stage was designed to use trust established by the previous one, creating a cascading chain of compromise that moved laterally across entirely different package ecosystems. Here’s the full timeline as reconstructed from Socket.dev’s and Wiz.io’s published analyses.

    Stage 1 — Trivy Plugin Poisoning (Late February 2026)

    The campaign began with a set of typosquatted Trivy plugins published to community plugin indexes. Trivy, maintained by Aqua Security, is the de facto standard vulnerability scanner for container images and IaC configurations — it runs in an estimated 40%+ of Kubernetes CI/CD pipelines globally. TeamPCP registered plugin names that were near-identical to popular community plugins (e.g., trivy-plugin-referrer vs. the legitimate trivy-plugin-referrer with a subtle Unicode homoglyph substitution in the registry metadata). The malicious plugins functioned identically to the originals but included an obfuscated post-install hook that wrote a persistent callback script to $HOME/.cache/trivy/callbacks/.

    The callback script fingerprinted the host — collecting environment variables, cloud provider metadata (AWS IMDSv1/v2, GCP metadata server, Azure IMDS), CI/CD platform identifiers (GitHub Actions runner tokens, GitLab CI job tokens, Jenkins build variables), and Kubernetes service account tokens mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. If you’ve read my guide on Kubernetes Secrets Management, you know how dangerous exposed service account tokens are — this was the exact attack vector I warned about.

    Stage 2 — Docker Hub Image Tampering (Early March 2026)

    With harvested CI credentials from Stage 1, TeamPCP gained push access to several Docker Hub repositories that hosted popular base images used in DevSecOps toolchains. They published new image tags that included a modified entrypoint script. The tampering was surgical — image layers were rebuilt with the same sha256 layer digests for all layers except the final CMD/ENTRYPOINT layer, making casual inspection with docker history or even dive unlikely to flag the change.

    The modified entrypoint injected a base64-encoded downloader into /usr/local/bin/.health-check, disguised as a container health monitoring agent. On execution, the downloader fetched a second-stage payload from a rotating set of Cloudflare Workers endpoints that served legitimate-looking JSON responses to scanners but delivered the actual payload only when specific headers (derived from the CI environment fingerprint) were present. This is a textbook example of why SBOM and Sigstore verification aren’t optional — they’re survival equipment.

    Stage 3 — KICS GitHub Action Compromise (March 10–12, 2026)

    This stage represented the most aggressive escalation. KICS (Keeping Infrastructure as Code Secure) is Checkmarx’s open-source IaC scanner, widely used via its official GitHub Action. TeamPCP leveraged compromised maintainer credentials (obtained via credential stuffing from a separate, unrelated breach) to push a backdoored release of the checkmarx/kics-github-action. The malicious version (tagged as a patch release) modified the Action’s entrypoint.sh to exfiltrate the GITHUB_TOKEN and any secrets passed as inputs.

    Because GitHub Actions tokens have write access to the repository by default (unless explicitly scoped with permissions:), TeamPCP used these tokens to open stealth pull requests in downstream repositories — injecting trojanized workflow files that would persist even after the KICS Action was reverted. Socket.dev’s analysis identified over 200 repositories that received these malicious PRs within a 48-hour window. This is exactly the kind of lateral movement that GitOps security patterns with signed commits and branch protection would have mitigated.

    Stage 4 — OpenVSX Malicious Extensions (March 13–15, 2026)

    While Stages 1–3 targeted CI/CD pipelines, Stage 4 pivoted to developer workstations. TeamPCP published a set of VS Code extensions to the OpenVSX registry (the open-source alternative to Microsoft’s marketplace, used by VSCodium, Gitpod, Eclipse Theia, and other editors). The extensions masqueraded as enhanced Trivy and KICS integration tools — “Trivy Lens Pro,” “KICS Inline Fix,” and similar names designed to attract developers already dealing with the fallout from the earlier stages.

    Once installed, the extensions used VS Code’s vscode.workspace.fs API to read .env files, .git/config (for remote URLs and credentials), SSH keys in ~/.ssh/, cloud CLI credential files (~/.aws/credentials, ~/.kube/config, ~/.azure/), and Docker config at ~/.docker/config.json. The exfiltration was performed via seemingly innocent HTTPS requests to a domain disguised as a telemetry endpoint. This is a stark reminder that zero trust isn’t just a network architecture — it applies to your local development environment too.

    Stage 5 — LiteLLM PyPI Package Compromise (March 16–17, 2026)

    The final stage targeted the AI/ML toolchain. LiteLLM, a popular Python library that provides a unified interface for calling 100+ LLM APIs, was compromised via a dependency confusion attack on PyPI. TeamPCP published litellm-proxy and litellm-utils packages that exploited pip’s dependency resolution to install alongside or instead of the legitimate litellm package in certain configurations (particularly when using --extra-index-url pointing to private registries).

    The malicious packages included a setup.py with an install class override that executed during pip install, harvesting API keys for OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and other LLM providers from environment variables and configuration files. Given that LLM API keys often have minimal scoping and high rate limits, the financial impact of this stage alone was significant — multiple organizations reported unexpected API bills exceeding $50,000 within hours.

    Payload Mechanism: Technical Breakdown

    Across all five stages, TeamPCP used a consistent payload architecture that reveals a high level of operational maturity:

    • Multi-stage loading: Initial payloads were minimal dropper scripts (under 200 bytes in most cases) that fetched the real payload only after environment fingerprinting confirmed the target was a high-value CI/CD system or developer workstation — not a sandbox or researcher’s honeypot.
    • Environment-aware delivery: The C2 infrastructure used Cloudflare Workers that inspected request headers and TLS fingerprints. Payloads were delivered only when the User-Agent, source IP range (matching known CI provider CIDR blocks), and a custom header derived from the environment fingerprint all matched expected values. Researchers attempting to retrieve payloads from clean environments received benign JSON responses.
    • Fileless persistence: On Linux CI runners, the payload operated entirely in memory using memfd_create syscalls, leaving no artifacts on disk for traditional file-based scanners. On macOS developer workstations, it used launchd plist files with randomized names in ~/Library/LaunchAgents/.
    • Exfiltration via DNS: Stolen credentials were exfiltrated using DNS TXT record queries to attacker-controlled domains — a technique that bypasses most egress firewalls and HTTP-layer monitoring. The data was chunked, encrypted with a per-target AES-256 key derived from the machine fingerprint, and encoded as subdomain labels. If you have security monitoring in place, check your DNS logs immediately.
    • Anti-analysis: The payload checked for common analysis tools (strace, ltrace, gdb, frida) and virtualization indicators (/proc/cpuinfo flags, DMI strings) before executing. If any were detected, it self-deleted and exited cleanly.

    Are You Affected? — Incident Response Checklist

    Run through this checklist now. Don’t wait for your next sprint planning session — this is a drop-everything-and-check situation.

    Trivy Plugin Check

    # List installed Trivy plugins and verify checksums
    trivy plugin list
    ls -la $HOME/.cache/trivy/callbacks/
    # If the callbacks directory exists with ANY files, assume compromise
    sha256sum $(which trivy)
    # Compare against official checksums at github.com/aquasecurity/trivy/releases

    Docker Image Verification

    # Verify image signatures with cosign
    cosign verify --key cosign.pub your-registry/your-image:tag
    # Check for unexpected entrypoint modifications
    docker inspect --format='{{.Config.Entrypoint}} {{.Config.Cmd}}' your-image:tag
    # Look for the hidden health-check binary
    docker run --rm --entrypoint=/bin/sh your-image:tag -c "ls -la /usr/local/bin/.health*"

    KICS GitHub Action Audit

    # Search your workflow files for KICS action references
    grep -r "checkmarx/kics-github-action" .github/workflows/
    # Check if you're pinning to a SHA or a mutable tag
    # SAFE: uses: checkmarx/kics-github-action@a]4f3b... (SHA pin)
    # UNSAFE: uses: checkmarx/kics-github-action@v2 (mutable tag)
    # Review recent PRs for unexpected workflow file changes
    gh pr list --state all --limit 50 --json title,author,files

    VS Code Extension Audit

    # List all installed extensions
    code --list-extensions --show-versions
    # Search for the known malicious extension IDs
    code --list-extensions | grep -iE "trivy.lens|kics.inline|trivypro|kicsfix"
    # Check for unexpected LaunchAgents (macOS)
    ls -la ~/Library/LaunchAgents/ | grep -v "com.apple"

    LiteLLM / PyPI Check

    # Check for the malicious packages
    pip list | grep -iE "litellm-proxy|litellm-utils"
    # If found, IMMEDIATELY rotate all LLM API keys
    # Check pip install logs for unexpected setup.py execution
    pip install --log pip-audit.log litellm --dry-run
    # Audit your requirements files for extra-index-url configurations
    grep -r "extra-index-url" requirements*.txt pip.conf setup.cfg pyproject.toml

    DNS Exfiltration Check

    # If you have DNS query logging enabled, search for high-entropy subdomain queries
    # The exfiltration domains used patterns like:
    # [base64-chunk].t1.teampcp[.]xyz
    # [base64-chunk].mx.pcpdata[.]top
    # Check your DNS resolver logs for any queries to these TLDs with long subdomains

    If any of these checks return positive results: Treat it as a confirmed breach. Rotate all credentials (cloud provider keys, GitHub tokens, Docker Hub tokens, LLM API keys, Kubernetes service account tokens), revoke and regenerate SSH keys, and audit your git history for unauthorized commits. Follow your organization’s incident response plan. If you don’t have one, my threat modeling guide is a good place to start building one.

    Long-Term CI/CD Hardening Defenses

    Responding to TeamPCP is necessary, but it’s not sufficient. This attack exploited systemic weaknesses in how the industry consumes open-source dependencies. Here are the defenses that would have prevented or contained each stage:

    1. Pin Everything by Hash, Not Tag

    Mutable tags (:latest, :v2, @v2) are a trust-on-first-use model that assumes the registry and publisher are never compromised. Pin Docker images by sha256 digest. Pin GitHub Actions by full commit SHA. Pin npm/pip packages with lockfiles that include integrity hashes. This single practice would have neutralized Stages 2, 3, and 5.

    2. Verify Signatures with Sigstore/Cosign

    Adopt Sigstore’s cosign for container image verification and npm audit signatures / pip-audit for package registries. Require signature verification as a gate in your CI pipeline — unsigned artifacts don’t run, period.

    3. Scope CI Tokens to Minimum Privilege

    GitHub Actions’ GITHUB_TOKEN defaults to broad read/write permissions. Explicitly set permissions: in every workflow to the minimum required. Use OpenID Connect (OIDC) for cloud provider authentication instead of long-lived secrets. Never pass secrets as Action inputs when you can use OIDC federation.

    4. Enforce Network Egress Controls

    Your CI runners should not have unrestricted internet access. Implement egress filtering that allows only connections to known-good registries (Docker Hub, npm, PyPI, GitHub) and blocks everything else. Monitor DNS queries for high-entropy subdomain patterns — this alone would have caught TeamPCP’s exfiltration channel.

    5. Generate and Verify SBOMs at Every Stage

    An SBOM (Software Bill of Materials) generated at build time and verified at deploy time creates an auditable chain of custody for every component in your software. When a compromised package is identified, you can instantly query your SBOM database to determine which services are affected — turning a weeks-long investigation into a minutes-long query.

    6. Use Hardware Security Keys for Publisher Accounts

    Stage 3 was only possible because maintainer credentials were compromised via credential stuffing. Hardware security keys like the YubiKey 5 NFC make phishing and credential stuffing attacks against registry and GitHub accounts virtually impossible. Every developer and maintainer on your team should have one — they cost $50 and they’re the single highest-ROI security investment you can make.

    The Bigger Picture

    TeamPCP’s attack is a watershed moment for the DevSecOps community. It demonstrates that the open-source supply chain is not just a theoretical risk — it’s an active, exploited attack surface operated by sophisticated threat actors who understand our toolchains better than most defenders do.

    The uncomfortable truth is this: we’ve built an industry on implicit trust in package registries, and that trust model is broken. When your vulnerability scanner can be the vulnerability, when your IaC security Action can be the insecurity, when your AI proxy can be the exfiltration channel — the entire “shift-left” security model needs to shift further: to verification, attestation, and zero trust at every layer.

    I’ve been writing about these exact risks for months — from secrets management to GitOps security patterns to zero trust architecture. TeamPCP just proved that these aren’t theoretical concerns. They’re operational necessities.

    Start today. Pin your dependencies. Verify your signatures. Scope your tokens. Monitor your egress. And if you haven’t already, put an SBOM pipeline in place before the next TeamPCP — because there will be a next one.


    📚 Recommended Reading

    If this attack is a wake-up call for you (it should be), these are the resources I recommend for going deeper on supply chain security and CI/CD hardening:

    • Software Supply Chain Security by Cassie Crossley — The definitive guide to understanding and mitigating supply chain risks across the SDLC.
    • Container Security by Liz Rice — Essential reading for anyone running containers in production. Covers image scanning, runtime security, and the Linux kernel primitives that make isolation work.
    • Hacking Kubernetes by Andrew Martin & Michael Hausenblas — Understand how attackers think about your cluster so you can defend it properly.
    • Securing DevOps by Julien Vehent — Practical, pipeline-focused security that bridges the gap between dev velocity and operational safety.
    • YubiKey 5 NFC — Protect your registry, GitHub, and cloud accounts with phishing-resistant hardware MFA. Non-negotiable for every developer.

    🔒 Stay Ahead of the Next Supply Chain Attack

    I built Alpha Signal Pro to give developers and security professionals an edge — AI-powered signal intelligence that surfaces emerging threats, vulnerability disclosures, and supply chain risk indicators before they hit mainstream news. TeamPCP was flagged in Alpha Signal’s threat feed 72 hours before the first public disclosure.

    Get Alpha Signal Pro → — Real-time threat intelligence, curated security signals, and early warning for supply chain attacks targeting your stack.

    Related Articles

    Get Weekly Security & DevOps Insights

    Join 500+ engineers getting actionable tutorials on Kubernetes security, homelab builds, and trading automation. No spam, unsubscribe anytime.

    Subscribe Free →

    Delivered every Tuesday. Read by engineers at Google, AWS, and startups.

    Frequently Asked Questions

    What is TeamPCP Supply Chain Attacks on Trivy, KICS & LiteLLM about?

    On March 17, 2026, the open-source security ecosystem experienced what I consider the most sophisticated supply chain attack since SolarWinds. A threat actor operating under the handle TeamPCP execute

    Who should read this article about TeamPCP Supply Chain Attacks on Trivy, KICS & LiteLLM?

    Anyone interested in learning about TeamPCP Supply Chain Attacks on Trivy, KICS & LiteLLM and related topics will find this article useful.

    What are the key takeaways from TeamPCP Supply Chain Attacks on Trivy, KICS & LiteLLM?

    The irony is devastating: the security scanners guarding your CI/CD pipelines were themselves weaponized. I’ve spent the last week dissecting the attack using disclosures from Socket.dev and Wiz.io ,

Also by us: StartCaaS — AI Company OS · Hype2You — AI Tech Trends