HighlyAvailableKubernetes

nicolaw 14th September 2020 at 10:34pm
Kubernetes

Considerations

  • CAP - consistency (eventual/data latency), availability, partition (network/quorum)
    • Consistency is acceptable drift by being eventual
  • Backup/restore
    • RTO - Recovery Time Objective
    • RPO - Recovery Point Objective (PITR)
  • Monitoring, alerting, escalation
    • Synthetic application monitoring (white, grey and blackbox)
      • Latency monitoring
    • Timeseries metrics
      • Latency monitoring
  • Versioned Kubernetes Secrets and ConfigMaps

So What's Next?

  • Upgrade/rollback Kubernetes application - relies on versioned product
  • Fully ORM abstraction layer in all database communication to stage backward/forward compatible schema changes between application deltas - urgh potentially loads of effort (handled per application, or shared library or abstraction service)
  • Geographical/datacentre fault tolerence: deploy all components to at least n+1 availability zones - mostly done
    • Autoscaling Kubernetes node capacity - easy
  • Application fault tolerence: multiple replicas inside Kubernetes - done (?)
    • Pod resource requirements and quotas - easy
    • Deployment RollingUpdate strategy - easy
    • Deployment readinessProbe reflect genuinely readiness - easy
  • Versioned Kubernetes Secrets and ConfigMaps - investigate
  • Load shedding: graceful service degredation at system limits - potentially some effort
  • DoS protection - AWS shield?

Zero Downtime Infrastructure Upgrade

Kubernetes Control Plane & Node Groups

Cloud Platform & Data Persistence

  • "Depends"

Related Reading to End Goal