آگهی‌های استخدامی

استخدام Site Reliability Engineer

اسنپ تریپ | Snapp Trip
تهران، تهران

شرح موقعیت شغلی

About the role:

Snapptrip is a subsidiary of Snapp Group, the largest Internet services company in the Middle East. Snapptrip operates in the field of online travel services (hotel and ticket). We are ambitious, passionate, and excited about pushing the boundaries of the travel industry to new frontiers in order to become the first choice of every traveler in Iran.

The Reliability Engineer II will work with other Reliability Engineers (RE), Product Managers, Software Engineers, and Architects to produce mission-critical infrastructure, tools, performance improvements, actionable and meaningful performance measurements, and communication to stakeholders. The SRE II is expected to work with management, peers, and customers to define and implement the technical vision, improve monitoring tools, error detections, defects elimination while improving Mean Time to Detection/Resolution, overall service availability, and customer satisfaction.

Take the first step towards your dream career, every Snapptrip technologies team member brings something unique to the table. Here’s what we are looking for with this role:

Responsibilities:

  • Improving and developing reliability platform, building out custom tools, infrastructure, and services. Automation of manual tasks to reduce toil.
  • Contribute to reference architectures
  • Perform engineering and technical tasks as assigned by applying general engineering principles.
  • Perform independent research in support of technical tasks.
  • Participate in an on-call rotation, have strong written communication skills, and be able to develop working relationships with coworkers.
  • Provide technical expertise and consultation through direct involvement to identify and resolve problems.
  • Proposes ideas and solutions within the SRE and Product teams to solve common issues.
  • Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the RE organization.
  • Balance planned and reactive work using basic project planning techniques and technical roadmaps.
  • Experience negotiating SLIs, SLOs, and SLAs with product owners.
  • Works to implement service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale.
  • Work across multiple project teams simultaneously to support rapid development efforts.
  • Use data to understand the availability, reliability, and sustainability of our services
What we are looking for:

  • Bachelor's Degree in Computer Science, Software Engineering, Information Systems
  • 1-3 years of relevant experience 
  • Understanding of and comfort with the GNU/Linux operating system
  • Valuable Technologies Like: Web Services, Kubernetes, Git, Ansible, Terraform, Virtualization, Docker Containers, Kafka, RabbitMQ, Redi.
  • Familiar with Valuable Methodologies Like: Agile, SCRUM, Reliability Engineering, 12-factor apps, microservice architecture
  • Familiar with Valuable Observability Tools Like: Grafana, Prometheus, Zabbix, Elasticsearch, APM Tools
  • Networking basics: TCP vs UDP, basic troubleshooting, HTTP – load balancing, firewall, private networks, multi-tier design, scale-out, persistent data
  • Databases – at a minimum understands the basics – select/insert
  • Service Management – Incident Response, Change, and Problem Management
  • Familiarity with standard infrastructure concepts like load balancers, firewalls, object storage, and where/when they might be used
  • Intellectual curiosity, problem-solving, and openness is key to its success
  • Capable of digging into common system performance issues, such as "this is slow", developing metrics, and driving measurable improvements.
  • Can work on different tasks in different systems week to week
  • Knows when to ask for help and when to dig more on their own
  • Experience working on 24×7 environments oriented towards a zero downtime target

Flexible Remote:

This means that for the majority of the time you can work from home if you want. We would expect you to come into the office one day a week to maintain team knowledge.

مهارت‌های مورد نیاز

  • SRE
  • tcp/ip

حداقل سابقه کار

  • کمتر از سه سال

جنسیت

  • مهم نیست

وضعیت نظام وظیفه

  • مهم‌ نیست

نوع همکاری:

تمام وقت

تاریخ انتشار آگهی:

۱۴۰۰/۱۱/۲۰ (منقضی‌شده)
مشاهده آگهی‌های استخدام مشابه