استخدام Site Reliability Engineer
شرح موقعیت شغلی
Job Tasks:
- Be on a PagerDuty rotation to respond to JIBit availability incidents and provide support for service engineers with customer incidents.
- Use your on-call shift to prevent incidents from ever happening.
- Run our infrastructure with Ansible, Terraform and Kubernetes.
- Make monitoring and alerting alert on symptoms and not on outages.
- Document every action so your findings turn into repeatable actions–and then into automation.
- Improve the deployment process to make it as boring as possible.
- Design, build and maintain core infrastructure pieces that allow JIBit scaling to support hundred of thousands of concurrent users.
- Debug production issues across services and levels of the stack.
- Disaster Recovery plan and chaos engineer on infrastructure to prevent and readiness on every possible outage.
- Plan the growth of Jibit’s infrastructure
Requirements:
- Unix/Linux Knowledge
- Use Ansible to efficiently automate every task
- Implement "Infrastructure as Code" using Terraform and CI/CD for automation
- Monitoring and Metrics in Prometheus, Grafana and integrations with Slack
- Logging infrastructure
- Backend storage management and scaling
- Disaster Recovery and High Availability strategy
- Backup solution for every piece of infrastructure
- Chaos Engineering
- Proficiency in at least one programming language, preferably Go, Python
مهارتهای مورد نیاز
- reliability
- CI/CD
- Python
- Grafana
حقوق
- حقوق از ۱۰,۰۰۰,۰۰۰ تومان
جنسیت
- مهم نیست
وضعیت نظام وظیفه
- مهم نیست