Job details
Role Summary:
We are seeking a Senior Director to lead our cloud platform and reliability engineering strategy and execution. This role is accountable for building and operating a secure, scalable, and highly reliable cloud environment that enables engineering teams to deliver products faster and more safely.
This is a senior leadership position requiring deep cloud expertise, strong operational discipline, and the ability to influence across technology, security, and business teams.
Key Responsibilities:
- Define and lead the cloud platform strategy, including infrastructure, reliability, scalability, and cost optimization.
- Drive run cost optimization strategy across infrastructure, platforms, and shared services, ensuring cost efficiency without compromising reliability, security, or release velocity.
- Demonstrated experience operating and evolving large‑scale cloud environments, with measurable scope such.
- Proven ownership of 24x7 production operations with strict SLOs/SLAs, including availability, latency, error rates, and recovery objectives (RTO/RPO).
- Experience managing complex operational concerns at scale, including:
- Capacity planning and performance management
- Incident response for high‑severity, customer‑impacting events
- Change management and release safety at scale
- Vendor and third‑party service dependencies
- Build, lead, and mentor high‑performing engineering teams across platform engineering, cloud operations, and site reliability.
- Drive an automation‑first operating model, reducing manual work through Infrastructure as Code, CI/CD enablement, and standardized self‑service platforms.
- Own service reliability and operational excellence, including incident management, root cause analysis, and long‑term remediation.
- Partner with engineering, product, and security leaders to align platform capabilities with business priorities.
- Establish and enforce secure‑by‑design practices for infrastructure, applications, and data.
- Implement strong observability standards (monitoring, logging, metrics, alerting) to improve service health and decision‑making.
- Define and validate resilience and disaster recovery strategies, including failure testing and recovery readiness.
- Provide clear executive‑level reporting on platform health, risks, and roadmap progress.
- Manage third‑party vendors and cloud service partners as needed.
- Ensure effective 24x7 operational coverage, while continuously reducing operational burden through engineering improvements.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.
Qualifications
Required Qualifications:
- 15+ years of experience in engineering, infrastructure, cloud platform, or site reliability roles, with significant leadership responsibility
- Proven experience leading cloud platforms or cloud‑based services at scale (AWS, Azure, or GCP)
- Demonstrated experience running production systems with high availability and reliability expectations
- Strong ability to lead in a matrixed, cross‑functional environment
- Excellent communication skills, including the ability to explain complex technical topics to senior leadership
- Strong technical foundation in:
- Infrastructure as Code and configuration management
- CI/CD and release engineering practices
- Monitoring, logging, and alerting systems
- Linux environments and scripting or programming (e.g., Python)
Additional Information
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Get Weekly Job Offers
Be first to know when jobs open.