We are building a next-generation orchestration platform that enables self-service access to GPU-powered environments for AI engineers, data scientists, and developers. Our platform spans cloud, on-prem, and virtualized environments, exposing infrastructure through APIs and intuitive dashboards.
As Mid-level to Senior DevOps / SRE Engineer, you will play a key role in making this platform reliable, observable, and scalable—while working in a highly dynamic, R&D-driven environment.
Responsibilities:
Design, implement, and maintain CI/CD pipelines for automated build, test, and deployment.
Manage and improve infrastructure as code (IaC) using tools like Terraform
Monitor and optimize application performance, availability, and scalability.
Collaborate with developers to streamline code integration and deployment processes.
Implement and maintain containerization and orchestration platforms (Docker, Kubernetes).
Ensure security, compliance, and governance of cloud environments.
Troubleshoot issues across development, testing, and production environments
Requirements:
+3 years of experience as a DevOps Engineer or similar role.
Strong knowledge of CI/CD tools (Jenkins, GitLab CI).
Hands-on experience with Docker, Kubernetes.
Proficiency in Git/Linux/scripting languages (GO, Bash, Shell, etc.).
Proficiency in working with databases
Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK stack).
Understanding of networking, security, and system administration
Nice to Have
Experience in SRE practices (SLIs/SLOs, incident response, postmortems)
Familiarity with distributed systems and microservices architecture
Exposure to AI/ML infrastructure or GPU-based workloads
Experience working in hybrid and public cloud environments (AWS, GCP, or Azure)
Soft Skills & Mindset
Strong problem-solving ability and curiosity—comfortable working on ambiguous, unsolved challenges
R&D mindset: you enjoy experimenting, prototyping, and iterating on new ideas
Ability to think beyond tools and understand systems as a whole
Ownership mentality: you take responsibility for reliability and outcomes, not just tasks
Effective communication and collaboration across engineering teams
Resilience under pressure, especially during incidents or system failures
به خانه هـوش ایــران خوش اومدید!
خانه هوش ایران یک ساختار منسجم پژوهشی، آموزشی و اجرایی است که با هدف بومیسازی زیرساختهای هوش مصنوعی و تربیت نیروی انسانی متخصص فعالیت میکند. رویکرد استراتژیک ما، حذف فاصله میان دانش آکادمیک و نیازهای واقعی صنعت از طریق ایجاد یک زنجیره ارزش کامل است؛ زنجیرهای که نخبگان را از شناسایی اولیه در سطوح پایه تا هدایت پروژههای کلان و استراتژیک صنعتی همراهی میکند.