Job details
The Data Reliability Engineer supports the design, implementation, and ongoing improvement of cloud-native, containerized infrastructure that powers our data products and services. This role contributes to the reliability, scalability, security, and operational health of our data ecosystem while working closely with more experienced engineers and cross-functional teams.
In this role, you will help maintain and enhance platforms and services used across the Data organization. You will participate in day-to-day engineering efforts, support production operations, assist with automation and infrastructure improvements, and contribute to the successful delivery of data platform capabilities.
You will work with a modern data stack that includes Data Services running on EKS, such as Superset, Trino, Apache Doris ClickHouse, OpenMetadata and other platform components, as well as Databricks (Spark jobs), Airflow, and other Big Data technologies. The team is also expanding into new areas, including helping deploy and implement AI agents, providing opportunities to contribute to innovative solutions at the intersection of data and AI.
This is a strong opportunity for an engineer with a solid technical foundation who is eager to grow skills in cloud infrastructure, platform engineering, data systems, and reliability practices in a collaborative environment
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.
Qualifications
- 3 + years hands-on experience designing and operating cloud‑native infrastructure ( AWS / Azure / GCP )
- Bachelor’s degree in Computer Science, Engineering or a related field (Desirable but not mandatory).
- Knowledge of Infrastructure as Code (Terraform), including contributing to reusable modules and platform components.
- Good understanding of Kubernetes and container orchestration concepts.
- Familiarity with CI/CD systems, pipeline configuration, automation, and secure deployment practices.
- Foundational competencies in reliability engineering concepts (SLOs, error budgets, incident response).
- Basic understanding of database technologies including SQL, NoSQL, and common data storage patterns.
- Experience using observability tools and stacks (Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, or similar).
- Basic automation experience using Bash, Python, or Ansible-like tools.
- Working knowledge of software engineering practices including version control, testing, code reviews, and common design patterns.
- Experience participating in on-call rotations, incident response, and post-incident reviews.
Additional Information
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Get Weekly Job Offers
Be first to know when jobs open.