Respond to production incidents, troubleshoot, resolve, prepare after-action-reports and postmortem analyses of system, software, network, storage failures, etc
Document, design and update various development processes and practices
Improve infrastructure development and application development
Developing automation that improves deployment speed and service reliability in the containerized environment.
Observability, capacity planning, system and service performance analysis, and tuning
Debugging problems in production and test environments
Minimizing and hardening from infrastructure layer to application layer, microservices, and public-facing API gateway attack surface
Qualifications:
Having dominance over the Linux operating system (RedHat based preferred)
Extensive experience with container orchestration (Kubernetes) in a production environment
Extensive experience with databases management and maintenance (PostgreSQL, Backup/Recovery, Performance Tuning)
Strong working knowledge of Virtualization (VMware ESX, Veeam Backup, DRS,..)
Strong knowledge of load balancer and application proxy such as HAProxy, Traefik (IPVS, Proxy Protocol, LVS)
Familiar with NoSQL databases like Redis, MongoDB
Experience in maintaining a FTS like Solr
Familiar with monitoring and logging tools such as Prometheus, Grafana, Loki, Promtail
Familiar with CI/CD procedure in Gitlab
Familiar with immutable infrastructure and IaC principles (Terraform)
Being familiar with the cloud-native solution is a plus