استخدام Senior Site Reliability Engineer (SRE)
شرح موقعیت شغلی
Alibaba is looking for a Senior Site Reliability Engineer (SRE) to help us improve and expand our rapidly growing products.
Responsibilities
- Work within a team of like-minded professionals to plan, deploy, and maintain business critical applications.
- Consult and Advocate Development teams and broader Engineering groups in adopting DevOps best practices.
- Make legacy applications 12 factor and Cloud Native Ready.
- Automate the provisioning and upgrade and maintenance of DevOps Managed Services (Less Toil = More Happiness)
- Troubleshoot problems, involving the appropriate resources and driving resolution of issues with a focus on minimizing impact to our customers.
- Participate in the Agile DevOps design, development, testing, and release of new capabilities and features with focus on release and post-production support.
- Provide production support for the suite of apps in the domain in Agile stand-ups, planning sessions and deployment activities.
- Participate in On-Call rotation
- Identify recurring issues and work with IT & Business partners to remediate using the problem management process.
Requirements
- Strong background in Linux/Unix and Windows server Administration
- Strong TCP/IP knowledge
- Expertise with containers and container orchestration technologies, including Docker, Docker Swarm, Kubernetes
- Experience with automation/configuration management using Ansible or puppet
- Scripting Skills with at least one language (Bash, PowerShell, Python, Perl and/or Ruby)
- Knowledge of at least one programming language (.NET, Node.js, Go)
- Knowledge of best practices and IT operations in an always-up, always-available service
- CI/CD implementation with Gitlab
- Expertise with Key-Value and NoSQL solutions (Redis, MongoDB)
- Expertise with Relational Database solutions (MSSQL, PSQL)
- Experience with continuous monitoring using Zabbix, Prometheus, Grafana ELK, EFK
- Experienced in web servers and load balancers (Nginx, HAproxy)
- Experience with messaging pub/sub systems (RabbitMQ, Active-MQ, Kafka etc.)
- Good Knowledge of Virtualization especially KVM
- Good Knowledge of Git/GitFlow
- Excellent troubleshooting capabilities and an ability to quickly learn new technologies
Big Plus Skills
- Good Knowledge of Service Mesh implementation
- Experience with Hadoop ecosystem
- Experience with OWASP concept
- Prior Production Experience with Rancher
- Experience with Spinnaker
مهارتهای مورد نیاز
- SRE
- reliability
- Node.js
حداقل سابقه کار
- سه تا شش سال
جنسیت
- مهم نیست
وضعیت نظام وظیفه
- مهم نیست