KubeMedic
Your Kubernetes cluster's automated safety net
Protect your clusters with intelligent auto-remediation that prevents cascading failures, automatically scales resources, and maintains system stability - all while respecting your safety boundaries and resource quotas.
kubectl apply -f https://raw.githubusercontent.com/ikepcampbell/kubemedic/main/config/deploy/kubemedic.yaml
🛡️ Safe by Default
- Protected system namespaces
- Resource quotas and scaling limits
- Automatic state backups
- Gradual scaling with revert
🎯 Common Remediations
- CPU/Memory-based scaling
- Pod restart on high error rates
- HPA limit adjustments
- Temporary resource overrides
🔒 Built-in Safeguards
- Maximum 2x scaling factor
- Rate limiting and cooldowns
- Resource quota validation
- Protected resources via labels
Example Policy
apiVersion: remediation.kubemedic.io/v1alpha1
kind: SelfRemediationPolicy
metadata:
name: cpu-scaling
namespace: my-app
spec:
rules:
- name: high-cpu-scale
conditions:
- type: PodCPUUsage # Uses metrics-server directly
threshold: "80" # 80% CPU usage
duration: "5m"
actions:
- type: ScaleUp
target:
kind: Deployment
name: my-service
scalingParams:
temporaryMaxReplicas: 5
scalingDuration: "30m"
revertStrategy: "Gradual"
Monitoring Options
Basic Monitoring
- Uses Kubernetes metrics API directly
- Real-time metrics without historical data
- Standard kubectl commands for monitoring
Advanced Monitoring
- Optional Prometheus integration
- Historical data and advanced querying
- Grafana dashboards available
Prerequisites
Required
- Kubernetes cluster (v1.16+)
- Metrics Server installed
Optional
- Prometheus for historical data
- Grafana for visualization
Built with Go, Kubernetes APIs, and controller-runtime using operator-sdk