This is an Engineering or Developer position focusing on data integration and management in a large-scale environment, specifically within the Retail domain. The role involves experience with Hadoop, SQL, PySpark, and cloud-native data lakes. Strong analytical and architectural skills are required.
Requirements
- Strong hands-on expertise in Hadoop ecosystem (HDFS, Hive, Spark, Oozie, Yarn, HBase, Kafka, Zookeeper).
- Deep understanding of data ingestion, transformation, and storage patterns in large-scale environments.
- Experience with distributed computing, data partitioning, and parallel processing.
- Proficiency in SQL, PySpark, Scala, or Java.
- Familiarity with cloud-native data lakes on AWS (EMR, Glue, S3), Azure (HDInsight, ADLS, Synapse), or GCP (Dataproc, BigQuery).
- Knowledge of data governance tools (Apache Atlas, Ranger, Collibra) and workflow orchestration tools (Airflow, Oozie).
- Expertise in Data Warehousing and ETL processes, including Design, Development, Support, Implementation, and Testing.
- Experience in Architecture, design including requirement analysis, performance tuning, data conversion, loading, extraction, transformation, and creating job pipelines.
- Hands on exp in Architecture, design including requirement analysis, performance tuning, data conversion, loading, extraction, transformation, and creating job pipelines.
- Strong at database commands (DDL and DML) and data warehousing implementations models.
- Hands on exp with the Hadoop ecosystem, including HDFS, Hive, Sqoop, NiFi, and YARN.
- Experience with Mainframe ESP for job scheduling.
- Implementation exp in indexes, table partitioning, collections, analytical functions, and materialized views.
- Proficient with ServiceNow, Confluence, Bitbucket, and JIRA.
- CI/CD with Jenkins and SourceTree.