HighlyAvailableKubernetes: ~nicolaw — Übergeek & CatLoaf Facilitator

HighlyAvailableKubernetes

nicolaw 14th September 2020 at 10:34pm

Kubernetes zero downtime deployment: when theory meets the database - rolling update + backward/forward compatible database schema changes w/data duplication via database access abstraction
Zero-downtime Deployment in Kubernetes with Jenkins - rolling update + blue/green
Zero-Downtime Rolling Updates With Kubernetes - interesting use of lifecycle: preStop: sleep command to ensure no lost client connections from ingress
Enable Rolling updates in Kubernetes with Zero downtime

CAP - consistency (eventual/data latency), availability, partition (network/quorum)
- Consistency is acceptable drift by being eventual
Backup/restore
- RTO - Recovery Time Objective
- RPO - Recovery Point Objective (PITR)
Monitoring, alerting, escalation
- Synthetic application monitoring (white, grey and blackbox)
  - Latency monitoring
- Timeseries metrics
  - Latency monitoring
Versioned Kubernetes Secrets and ConfigMaps

Upgrade/rollback Kubernetes application - relies on versioned product
~~Fully ORM abstraction layer in~~ all database communication to stage backward/forward compatible schema changes between application deltas - urgh potentially loads of effort (handled per application, or shared library or abstraction service)
Geographical/datacentre fault tolerence: deploy all components to at least n+1 availability zones - mostly done
- Autoscaling Kubernetes node capacity - easy
Application fault tolerence: multiple replicas inside Kubernetes - done (?)
- Pod resource requirements and quotas - easy
- Deployment RollingUpdate strategy - easy
- Deployment readinessProbe reflect genuinely readiness - easy
  - Consider use of livenessProbe probe but use it correctly - easy
Versioned Kubernetes Secrets and ConfigMaps - investigate
Load shedding: graceful service degredation at system limits - potentially some effort
DoS protection - AWS shield?

Node groups - simply perform blue/green failover of node groups in Kubernetes control plane
EKS - not found anything yet
GKE - https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime