Key Responsibilities: · Develop and maintain real-time data pipelines using Apache Spark Structured Streaming with Kafka. · Parse and process complex data formats , including XML, JSON, and CSV . · Apply advanced Spark optimization techniques such as broadcast joins, partitioning, and caching . · Implement SCD Type 1 & Type 2 logic within data warehouse models. · Design and orchestrate ETL workflows using: o AWS Glue (trigger-based jobs) o EventBridge, Lambda, and Step Functions · Work with AWS Big Data ecosystem , including: o AWS Glue for serverless ETL o Amazon S3 and Athena for data lake querying o Lambda and EventBridge for event-driven workflows o DynamoDB for real-time lookup tables · Manage Apache Iceberg internals , including Compaction, Schema Evolution, Time Travel, and Snapshot Management . · Utilize PyTest for writing unit and integration tests for PySpark pipelines. · Perform local development & testing of Glue jobs using Docker & PySpark . · Deploy and manage data jobs through CI/CD pipelines , leveraging Jenkins, GitHub, and JFrog Artifactory . · Collaborate with architects and stakeholders to define scalable and efficient solutions aligned with business needs. Key Requirements: · Strong expertise in Apache Spark Structured Streaming, Kafka, and PySpark. · Proven experience with Spark job performance tuning . · Proficiency in parsing & transforming XML data . · Deep understanding of AWS services , including Glue, Lambda, S3, DynamoDB, and Athena . · Hands-on experience with Apache Iceberg internals , covering compaction, schema evolution, and metadata management . · Familiarity with Docker-based local Glue development . · Strong testing discipline using PyTest . · Experience implementing SCD1/2 logic in data pipelines. · Solid background in CI/CD environments using Jenkins, GitHub, and JFrog . · Excellent communication and problem-solving skills . Preferred Skills · Experience in the Banking or Financial Services domain. · Familiarity with data governance, lineage, or compliance frameworks . · Exposure to Terraform, CloudFormation, or Airflow .