آگهی‌های استخدامی

استخدام SRE Engineer

اسنپ فود | Snappfood
تهران، تهران

شرح موقعیت شغلی

In the story of Snappfood, we want to create value. Our company is willing to create emerging phenomena and is eager to have SRE Engineer on our team to help us get through business challenges with creativity, intelligence, and speed of
response.
We are waiting for you to be with us in this story



Responsibilities:

● Collaborate with software development teams to ensure the reliability, performance, and high availability of production systems and services.
● Identify and proactively address potential issues that could impact system reliability, including capacity planning and incident response.
● Develop and implement automated solutions to enhance system reliability and performance.
● Participate in on-call rotations and respond to emergencies promptly.
● Assist in developing and maintaining service level indicators (SLIs) and service level objectives (SLOs).
● Apply best practices and principles of site reliability engineering throughout the software development lifecycle.
● Monitor and analyze system metrics to proactively identify and resolve potential issues, ensuring high availability and minimal downtime.
● Design and maintain comprehensive monitoring and alerting systems for quick detection and response to incidents.
● Conduct thorough post-incident reviews and root cause analysis, implementing preventive measures to minimize future occurrences.
● Automate operational tasks and processes to improve efficiency and reduce manual effort.
● Implement and maintain disaster recovery and business continuity plans to ensure the integrity and availability of critical systems.
● Provide support and guidance to development teams in designing and deploying applications in production environments.
● Stay up-to-date with industry trends, best practices, and actively contribute to infrastructure and operations improvement

Requirements:

● Solid understanding of networking principles, protocols, and troubleshooting techniques. 
● Knowledge of distributed systems, microservices architecture, and cloud-native technologies. 
● Proficiency with operating systems, networking, and computer systems architecture. 
● Experience with technologies such as Nginx, HAProxy, Chef, Ansible, Terraform, GitLab CI/CD, Docker, Kubernetes, or similar. 
● Familiarity with programming languages. 
● Experience with monitoring and observability tools like Prometheus, Grafana, and New Relic. 
● Strong incident management skills, including the ability to triage and resolve issues affecting system reliability and performance. 
● Familiarity with error budgeting concepts and the ability to prioritize and allocate error budget for optimal system reliability and availability. 
● Knowledge of database administration and performance tuning. 
● Proficient in troubleshooting and resolving performance bottlenecks and complex system issues. 
● Minimum of 3 years of experience in a similar role, preferably in a large-scale, production environment. 
● Bachelor's degree in computer science, software engineering, or a related field. A master's degree is a plus 

مهارت‌های مورد نیاز

  • SRE
  • CICD
  • Docker
  • kubernetes

حداقل سابقه کار

  • سه تا شش سال

جنسیت

  • مهم نیست

وضعیت نظام وظیفه

  • مهم‌ نیست

نوع همکاری:

تمام وقت

دسته‌بندی شغلی:

IT / DevOps / Server

تاریخ انتشار آگهی:

۱۴۰۲/۰۴/۲۰ (منقضی‌شده)
مشاهده آگهی‌های استخدام مشابه