Involve Asia is seeking a Data Engineer who will be responsible for supporting data scientists, analysts, and software engineers by providing maintainable infrastructure and tooling to deliver end-to-end solutions to business problems. The successful candidate will work with terabytes of data in a complex data environment supporting multiple products and data stakeholders.
As a Data Engineer, you will be responsible for designing and implementing an analytical environment using in-house and third-party tools. You will use Python and/or Java to automate data activities and enable efficient processing of data that is growing in both volume and complexity. You will also design and implement complex data pipelines and data models for analytical consumption, using tools such as EMR, Kubernetes, Airflow, and more.
You will write scalable and performant SQL queries running over billions of rows of data, and help simplify these processing to enable insights to be more easily extractable from them. You should have deep experience in designing and managing large datasets and pipelines to enable business use-cases.
Key Responsibilities:
- Design, implement, operate, and improve the analytics platform
- Design data solutions using various big data technologies and low latency architectures
- Collaborate with data scientists, business analysts, product managers, software engineers, and other data engineers to develop, implement, and validate deployed data solutions
- Maintain the data warehouse with timely and quality data
- Build and maintain data pipelines from internal databases and SaaS applications
- Understand and implement data engineering best practices
- Improve, manage, and teach standards for code maintainability and performance in code submitted and reviewed
- Plan and build application to acquire the data
Qualifications:
- Expert at writing and optimizing SQL queries
- Proficiency in Python, Java or similar languages
- Familiarity with data warehousing concepts
- Experience in Airflow or other workflow orchestrators
- Familiarity with basic principles of distributed computing
- Experience with big data technologies like Spark, Delta Lake or others
- Proven ability to innovate and leading delivery of a complex solution
- Excellent verbal and written communication - proven ability to communicate with technical teams and summarize complex analyses in business terms
- Ability to work with shifting deadlines in a fast-paced environment
Desirable Qualifications:
- Authoritative in ETL optimization, designing, coding, and tuning big data processes using Spark
- Knowledge of big data architecture concepts like Lambda or Kappa
- Experience with streaming workflows to process datasets at low latencies
- Experience in managing data - ensuring data quality, tracking lineages, improving data discovery and consumption
- Sound knowledge of distributed systems - able to optimize partitioning, distribution and MPP of high-level data structures
- Experience in working with large databases, efficiently moving billions of rows, and complex data modeling
- Familiarity with AWS is a big plus
- Experience in planning day-to-day tasks, knowing how and what to prioritize, and overseeing their execution
- Have experience as backend or full stack software engineer