KubeMedic

Your Kubernetes cluster's automated safety net

Protect your clusters with intelligent auto-remediation that prevents cascading failures, automatically scales resources, and maintains system stability - all while respecting your safety boundaries and resource quotas.

kubectl apply -f https://raw.githubusercontent.com/ikepcampbell/kubemedic/main/config/deploy/kubemedic.yaml
🛡️ Safe by Default
  • Protected system namespaces
  • Resource quotas and scaling limits
  • Automatic state backups
  • Gradual scaling with revert
🎯 Common Remediations
  • CPU/Memory-based scaling
  • Pod restart on high error rates
  • HPA limit adjustments
  • Temporary resource overrides
🔒 Built-in Safeguards
  • Maximum 2x scaling factor
  • Rate limiting and cooldowns
  • Resource quota validation
  • Protected resources via labels
Example Policy
apiVersion: remediation.kubemedic.io/v1alpha1
kind: SelfRemediationPolicy
metadata:
  name: cpu-scaling
  namespace: my-app
spec:
  rules:
    - name: high-cpu-scale
      conditions:
        - type: PodCPUUsage    # Uses metrics-server directly
          threshold: "80"      # 80% CPU usage
          duration: "5m"
      actions:
        - type: ScaleUp
          target:
            kind: Deployment
            name: my-service
          scalingParams:
            temporaryMaxReplicas: 5
            scalingDuration: "30m"
            revertStrategy: "Gradual"

Monitoring Options

Basic Monitoring
  • Uses Kubernetes metrics API directly
  • Real-time metrics without historical data
  • Standard kubectl commands for monitoring
Advanced Monitoring
  • Optional Prometheus integration
  • Historical data and advanced querying
  • Grafana dashboards available
Prerequisites

Required

  • Kubernetes cluster (v1.16+)
  • Metrics Server installed

Optional

  • Prometheus for historical data
  • Grafana for visualization

Built with Go, Kubernetes APIs, and controller-runtime using operator-sdk