Job Details
Overview
Watch this video to learn more about Citi
We are seeking a highly skilled and experienced Senior Big Data/PySpark Engineer to join our dynamic Big Data Analytics team. The ideal candidate will have a strong background in Python programming and extensive experience with Apache Spark, particularly PySpark, for large-scale data processing and analytics. This role involves designing, developing, and optimizing robust and scalable data pipelines, working with vast datasets, and contributing to the architecture of our Big Data solutions. A key responsibility of this role is to work directly with business stakeholders to implement solutions and provide technical leadership to junior developers.
Responsibilities
- Design, develop, and maintain efficient, scalable, and reliable data pipelines using PySpark and other Big Data technologies.
- Implement complex data transformations, aggregations, and data quality checks on large datasets.
- Collaborate directly with business and technology stakeholders to understand data requirements and translate them into technical specifications and end-to-end solutions.
- Contribute significantly to data modeling, data architecture, data quality frameworks, and data governance policies.
- Optimize PySpark and other data processing jobs for performance, efficiency, and cost-effectiveness.
- Utilize Apache products (e.g., Kafka, Flink, NiFi) for real-time data streaming and transformation.
- Develop and maintain comprehensive documentation for data pipelines, data models, and data processing logic.
- Participate in code reviews, ensuring high code quality, best practices, and adherence to established standards.
- Troubleshoot and resolve complex issues in existing data pipelines and data processing jobs.
- Provide strong leadership and mentorship to junior developers, assisting them with technical challenges and fostering their growth.
- Stay up-to-date with the latest advancements in PySpark, Apache Spark, and the broader Big Data ecosystem.
Required Qualifications
- 8-12 years of relevant experience in software development with a focus on Big Data technologies.
- Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field.
- 5+ years of deep, hands-on experience with PySpark for large-scale data processing.
- Strong proficiency and hands-on experience in Python programming, including object-oriented design and data manipulation libraries (e.g., Pandas, NumPy).
- In-depth understanding of Apache Spark architecture, including Spark Core, Spark SQL, and the DataFrame API.
- Extensive hands-on experience with the Big Data ecosystem, including HDFS, and other Apache products for data streaming and transformation (e.g., Kafka, Flink, NiFi).
- Proven experience working directly with business users to gather requirements and implement technical solutions.
- Demonstrated leadership experience with the ability to mentor and provide technical guidance to junior developers.
- Experience with various data storage technologies such as HDFS, S3, Azure Blob Storage, or similar distributed file systems.
- Solid understanding of relational databases and SQL.
- Experience with version control systems (e.g., Git).
- Excellent problem-solving, analytical, and communication skills.
Preferred Qualifications
- Experience with cloud platforms (AWS, Azure, GCP) and their Big Data services (e.g., EMR, Databricks, Glue, Azure Synapse, Google Dataproc).
- Familiarity with workflow orchestration tools (e.g., Apache Airflow, Luigi).
- Hands-on experience with streaming data processing frameworks (e.g., Kafka Streams, Spark Streaming, Flink).
- Strong knowledge of data warehousing concepts and advanced data modeling techniques.
- Experience with containerization technologies (e.g., Docker, Kubernetes).
- Deep understanding of data governance, data security, and compliance best practices.
Education:
- Bachelor’s degree/University degree or equivalent experience
- Master’s degree preferred
This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required.
------------------------------------------------------
Job Family Group:
Technology------------------------------------------------------
Job Family:
Applications Development------------------------------------------------------
Time Type:
Full time------------------------------------------------------
Most Relevant Skills
Please see the requirements listed above.------------------------------------------------------
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.------------------------------------------------------
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.
About Citi Working at Citi is far more than just a job. A career with us means joining a team of more than 200,000 dedicated people from around... Read more