آگهی‌های استخدامی

استخدام Senior Data Engineer

اسنپ تریپ | Snapp Trip
تهران، تهران

شرح موقعیت شغلی

Do you have a passion for working with the leading modern open-source Big Data frameworks in an agile team at Snapptrip?


Who we are:

Our team is cross-functional with a mix of Data analyst, BI developers and data engineers. We are working with open-source tools to help the business to make perfect decisions and also learning and developing ourselves.

Who you are:

First off, we believe that you are a software engineer who can harness the complexity and volume in data whilst saving resources of the Data stack.

Secondly, you were the best of Programming, DB, DS, Algorithm and OS classes. You are a geek and invincible developer who has a passion to learn modern and open-source Big Data engines.

Why you should apply:

Snapptrip has both mature and expanding products that are all data driven. In BI/Data team we are working with the leading modern open-source technologies to collect the products(Hotel, Flight, Train, Bus) data and subsequently, extract knowledge, So you have the opportunity to work directly with a bunch of cutting-edge open-source toolkits and distributed frameworks.

Responsibilities:

●     Collaborate with data team members and other stakeholders to understand data requirements and design efficient data solutions.

●     Develop and maintain scalable data pipelines using Python, Scala, Spark, and the Hadoop ecosystem.

●     Implement and optimize ETL/ELT processes for large-scale data processing.

●     Work with Linux, Git, and CI/CD tools to ensure robust version control and continuous integration of data solutions.

●     Utilize Hive, Postgresql, ClickHouse and other database technologies for data storage and retrieval.

●     Manage data streaming and processing using Kafka.

●     Implement and maintain database changes using Debezium.

●     Utilize monitoring tools such as Grafana, Prometheus, and Zabbix to ensure the health and performance of data systems.

●     Optimize query performance.

●     Work with container orchestration tools such as Kubernetes and Helm to deploy and manage data applications.

●     Implement workflow automation using Apache Airflow.

●     Ensure data quality and integrity through the use of Pandas, Parquet, Delta Lake, and other data processing technologies.

Core Qualifications:

●     Bachelor's or a Master’s degree in Computer engineering (Software | AI), Computer Science, Information Technology, or a related field.

●       Proven experience as a Data Engineer or similar role.

●     At least three years of practical experience in programming Scala, Java, or Python (preferably a JVM based language).

●     Linux

●     At least two years of practical experience in developing Spark applications.

●     Experience working with relational databases (preferably PostgreSQL)

●     Clickhouse

●     HDFS

●     SQL

●     Git

●     Experience with CI tools (CI/CD or Jenkins)

●     Experience with workflow automation tools like Apache Airflow and cronjobs.

●     Strong understanding of ETL/ELT processes and data processing technologies like Pandas, Parquet, and Delta Lake.

●     Familiarity with monitoring tools such as Grafana, Prometheus, and Zabbix.

●     Experience with streaming technologies like Kafka and Debezium.

●     Experience with REST API development using Flask, Akka, and FastAPI

●     Prometheus/Grafana 

●     A thorough understanding of parallel and distributed computing (We run Spark applications deployed on Kubernetes cluster and process data on HDFS)

●     A self starter

●     Effective communication

●     Kubernetes

●     PrestoSQL (Trino)

Nice to Have – Preferred Qualification

●     Experience with Power BI, SSIS, SSAS, and SQL Server.

●     Familiarity with Airbyte and dbt.

●     Exposure to the ELK stack.

You will be doing:

●     Developing and deploying applications in Scala, Spark(Scala or Python api) and Python to ingest|consume|analyze data in batch or streaming manner.

●     Proposing solutions for some problematic issues and maintaining the whole Data stack includes PostgreSQL, HDFS, Kafka, Airflow, Metabase, Spark, Kubernetes,Clickhouse and PrestoSQL (You will be central for the whole Data stack).

●     You will support other roles to work easily with open source tools and resolve their performance issues in SQL and Spark.

مهارت‌های مورد نیاز

  • Python
  • Data engineer
  • SQL

حداقل سابقه کار

  • سه تا شش سال

جنسیت

  • مهم نیست

وضعیت نظام وظیفه

  • مهم‌ نیست

نوع همکاری:

تمام وقت

تاریخ انتشار آگهی:

۱۴۰۲/۱۱/۱۴ (منقضی‌شده)
مشاهده آگهی‌های استخدام مشابه