استخدام (Site Reliability Engineer(Infra Team
شرح موقعیت شغلی
Responsibilities:
- Working on-call shift to prevent incidents from ever happening
- Infrastructure-as-code (IaC), an automated management and provisioning of infrastructure, deployment, version upgrades, backup and data recovery, and security administration.
- Build and maintain strong observability platforms and alerting mechanism.
- Implement SRE standards to improve reliability, stability and security
- Design and implement a reliable backup solution and disaster recovery plan for all services
- Determine SLA, SLI and SLO for all our services and coordinating the entire company to achieve that
- Collaborate with development and operations teams to identify and implement system improvements, including performance optimizations, capacity planning, and automation of repetitive tasks.
- Investigate and troubleshoot incidents, perform root cause analysis, and implement remediation actions.
- Work with developers to ensure that applications are designed and deployed in a scalable, secure, and efficient manner
Requirements:
- At least 3 years of experience as a DevOps/SRE Engineer or a related role
- Strong problem solving
- Teamwork skills
- experience with Linux system administration
- Experience with a scripting programming language (like go, python, bash)
- Experience with Kubernetes, Docker, and container orchestration
- Working experience on relational and non-relational database in production environment
- Strong experience in working with Grafana and Prometheus for monitoring and alerting in a production environment.
مهارتهای مورد نیاز
- +NETWORK
- Linux
- kubernetes
حداقل سابقه کار
- سه تا شش سال
جنسیت
- مهم نیست
وضعیت نظام وظیفه
- مهم نیست