استخدام SRE Engineer

اسنپ فود | Snappfood
تهران، تهران

شرح موقعیت شغلی

● Collaborate with software development teams to ensure the reliability, performance, and high availability of production systems and services.
● Identify and proactively address potential issues that could impact system reliability, including capacity planning and incident response.
● Develop and implement automated solutions to enhance system reliability and performance.
● Participate in on-call rotations and respond to emergencies promptly.
● Assist in developing and maintaining service level indicators (SLIs) and service level objectives (SLOs).
● Apply best practices and principles of site reliability engineering throughout the software development lifecycle.
● Monitor and analyze system metrics to proactively identify and resolve potential issues, ensuring high availability and minimal downtime.
● Design and maintain comprehensive monitoring and alerting systems for quick detection and response to incidents.
● Conduct thorough post-incident reviews and root cause analysis, implementing preventive measures to minimize future occurrences.
● Automate operational tasks and processes to improve efficiency and reduce manual effort.
● Implement and maintain disaster recovery and business continuity plans to ensure the integrity and availability of critical systems.
● Provide support and guidance to development teams in designing and deploying applications in production environments.
● Stay up-to-date with industry trends, best practices, and actively contribute to infrastructure and operations improvement

Requirements:

● Solid understanding of networking principles, protocols, and troubleshooting techniques.
● Knowledge of distributed systems, microservices architecture, and cloud-native technologies.
● Proficiency with operating systems, networking, and computer systems architecture.
● Experience with technologies such as Nginx, HAProxy, Chef, Ansible, Terraform, GitLab CI/CD, Docker, Kubernetes, or similar.
● Familiarity with programming languages.
● Experience with monitoring and observability tools like Prometheus, Grafana, and New Relic.
● Strong incident management skills, including the ability to triage and resolve issues affecting system reliability and performance.
● Familiarity with error budgeting concepts and the ability to prioritize and allocate error budget for optimal system reliability and availability.
● Knowledge of database administration and performance tuning.
● Proficient in troubleshooting and resolving performance bottlenecks and complex system issues.
● Minimum of 3 years of experience in a similar role, preferably in a large-scale, production environment.
● Bachelor's degree in computer science, software engineering, or a related field. A master's degree is a plus

مهارت‌های مورد نیاز

SRE
CICD
Docker

حداقل سابقه کار

سه تا شش سال

جنسیت

مهم نیست

وضعیت نظام وظیفه

مهم‌ نیست

نوع همکاری:

تمام وقت

دسته‌بندی شغلی:

IT / DevOps / Server

تاریخ انتشار آگهی:

۱۴۰۲/۰۳/۱۰ (منقضی‌شده)

مشاغل مشابه

مشاهده آگهی‌های استخدام مشابه

استخدام SRE Engineer

شرح موقعیت شغلی

مهارت‌های مورد نیاز

حداقل سابقه کار

جنسیت

وضعیت نظام وظیفه

نوع همکاری:

دسته‌بندی شغلی:

تاریخ انتشار آگهی:

مشاغل مشابه

استخدام کارشناس ارشد لینوکس در مجموعه سرویس های چاپار | Chaapaar

استخدام (DevOps Engineer (SRE در اتصال صنعت میانه | ESM

استخدام DevOps Engineer (SRE) در سحاب | Sahab

استخدام DevOps Engineer در هرمس کپیتال | Hermes Capital

استخدام DevOps Engineer در تلوبیون | Telewebion

استخدام DevOps Engineer در کاوشگران توسعه و فناوری روز | KTFR