PowerToFly
Recent searches
  • Events
  • Companies
  • Resources
  • Log in
    Don’t have an account? Sign up
Filters
Clear All
Advanced filters
Job type
  • Reset Show results
Date posted
  • Reset Show results
Experience level
  • Reset Show results
Company
  • Reset Show results
Skills
  • Reset Show results
Clear All
Cancel Show Results
Active filters:
Results 13968 Jobs
Loading...
Loading more jobs...

No more jobs to load

No more jobs to load

Manager - Site Reliability Engineering - Python, Terraform, CI/CD, Observability
Save Job
VISA

Manager - Site Reliability Engineering - Python, Terraform, CI/CD, Observability

Onsite Bengaluru, India Full Time
Posted 32 minutes ago
Save Job

Watch this video to learn more about VISA

Job Details

Job Description

The SRE Manager leads a team of Site Reliability Engineers responsible for delivering high availability, security, and performance across multiple Products within the Value-Added Services organization. This role balances hands on technical oversight with people leadership, guiding team execution in incident response, automation, observability, environment management, and operations. The Manager champions consistency, operational maturity, and the use of Generative AI and automation to reduce toil and strengthen reliability engineering practices. Acting as a cross functional partner, this leader will collaborate with product, engineering, and operations teams to implement resilient designs, scalable processes, and continuous improvement across production systems.

Responsibilities:

  • Lead the delivery of secure, reliable, and high-performing application services across distributed and hybrid environments.
  • Improve operational excellence, engineering discipline, and team execution through coaching, prioritization, and consistent process reinforcement.
  • Drive zero‑downtime reliability with proactive monitoring, structured incident response, and rigorous root‑cause remediation.
  • Oversee full environment management lifecycle: deployment governance, configuration updates, operational readiness assessments, and risk evaluation.
  • Foster an inclusive, collaborative, and high‑accountability culture focused on continuous learning and team‑wide development.
  • Build strong relationships with engineering, product, architecture, and operations to align on service priorities and long‑term reliability goals.
  • Communicate effectively with technical and non‑technical audiences, providing frameworks for decision‑making and problem‑solving.
  • Champion automation and Generative AI tooling to reduce manual processes, eliminate toil, and scale operational capabilities.
  • Lead cloud and hybrid infrastructure adoption initiatives with a focus on resilience and minimal downtime.
  • Facilitate incident bridges, coordinate cross‑team collaboration, and ensure proper escalation paths for critical issues.
  • Proactively communicate operational insights, risks, and status updates to cross‑functional stakeholders and PRE leadership.
  • Ensure the SRE team consistently delivers secure, stable, and efficient infrastructure aligned with business and engineering objectives.
  • Establish and track key SRE performance indicators (SLOs, error budgets, operational KPIs).
  • Drive growth, upskilling, and performance development across the SRE team, supporting engineers at multiple experience levels.

This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.


Qualifications

Basic Qualifications:

  • 8-11 years of relevant experience in SRE, Systems Engineering, or Software Engineering.
  • 2-4 years of experience leading engineers (people leadership or technical leadership).
  • Demonstrated ability to manage and prioritize team execution across incidents, change management, operations, and automation.
  • Strong understanding of distributed systems, on prem and cloud architectures, microservices, containers, and API ecosystems.
  • Proven ability to drive troubleshooting, RCA, and performance improvements.
  • Familiarity with Linux/Unix systems, CI/CD workflows, networking fundamentals, and observability practices.
  • Ability to communicate complex technical topics to senior leadership, cross‑functional stakeholders, and non‑technical audiences.
  • Proven ability to build team capability through mentorship, feedback, and performance coaching.
  • Experience driving the adoption of automation and/or Generative AI to improve operational efficiency.
  • Experience supporting or leading 24x7 operations and on‑call programs.


Preferred Qualifications:

  • Hands on experience with Java/J2EE, REST/SOAP architectures, and distributed services.
  • Direct experience supporting containerized applications and cloud platforms (AWS, GCP).
  • Expertise in Linux, Jenkins, Java/.NET applications, relational DBs, Tomcat, and Apache.
  • Proficiency in scripting and automation (Bash, Python, JavaScript, etc.).
  • Strong knowledge of infrastructure components (Linux, VMs, MQ, storage).
  • Understanding of Generative AI and operational applications.
  • Experience building tools and automation to streamline production support.
  • Solid understanding of observability platforms and best practices.

Additional Information

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.


Company Details
VISA
 Foster City, CA, United States
Work at VISA

At Visa, we are driven by a common purpose – to uplift everyone, everywhere by being the best way to pay and be paid. As our products and... Read more

Did you submit an application for the Manager - Site Reliability Engineering - Python, Terraform, CI/CD, Observability on the VISA website?