آگهی‌های استخدامی

استخدام (Site Reliability Engineer(Infra Team

شرح موقعیت شغلی

**Responsibilities:**

- Collaborate with software development teams to ensure the reliability, performance, and high availability of production systems and services.

- Identify and proactively address potential issues that could impact system reliability, including capacity planning and incident response.

- Participate in on-call rotations and respond to emergencies promptly.

- Assist in developing and maintaining service level indicators (SLIs) and service level objectives (SLOs).

- Apply best practices and principles of site reliability engineering throughout the software development lifecycle.

- Monitor and analyze system metrics to proactively identify and resolve potential issues, ensuring high availability and minimal downtime.

- Design and maintain comprehensive monitoring and alerting systems for quick detection and response to incidents.

- Conduct thorough post-incident reviews and root cause analysis, implementing preventive measures to minimize future occurrences.

- Automate operational tasks and processes to improve efficiency and reduce manual effort.

- Implement and maintain disaster recovery and business continuity plans to ensure the integrity and availability of critical systems.

- Participate in on-call rotation.

- Identify recurring issues and work with IT & business partners to remediate using the problem management process.

 

**Requirements:**

- Solid understanding of networking principles, protocols, and troubleshooting techniques.

- Knowledge of distributed systems, microservices architecture, and cloud-native technologies.

- Proficiency with operating systems, networking, and computer systems architecture.

- Experience with technologies such as Nginx, HAProxy, GitLab CI/CD, Docker, Kubernetes, or similar.

- Familiarity with programming languages (e.g., Bash, Python, .NET, Node.js).

- Experience with monitoring and observability tools like Prometheus, Grafana, and Zabbix.

- Strong incident management skills, including the ability to triage and resolve issues affecting system reliability and performance.

- Familiarity with error budgeting concepts and the ability to prioritize and allocate error budget for optimal system reliability and availability.

- Knowledge of database administration and performance tuning.

- Proficient in troubleshooting and resolving performance bottlenecks and complex system issues.

- Strong background in Linux/Unix and Windows server administration.

- Strong communication skills and the ability to work effectively across multiple technical teams.

- Good self-learning and research skills (ability to find an answer to a question or a solution to solve a problem).

- Good team-working skills.

- Strong documentation and reporting skills.

- Minimum of 3 years of experience in a similar role, preferably in a large-scale, production environment.

**Employment Type:** Full-Time  

**Salary:** Negotiable – Starting from 25 million IRR, depending on technical interview  

**Age Requirement:** Under 30 years - Preferably male  

**Work Hours:** On-site and shift-based  

مهارت‌های مورد نیاز

  • reliability
  • irr
  • Python

حداقل سابقه کار

  • کمتر از سه سال

حقوق

  • حقوق از ۲۴,۰۰۰,۰۰۰ تومان

جنسیت

  • مهم نیست

وضعیت نظام وظیفه

  • مهم‌ نیست

نوع همکاری:

تمام وقت

تاریخ انتشار آگهی:

۱۴۰۳/۰۵/۰۲ (منقضی‌شده)
مشاهده آگهی‌های استخدام مشابه