PowerToFly
Recent searches
  • Events
  • Companies
  • Resources
  • Log in
    Don’t have an account? Sign up
Filters
Clear All
Advanced filters
Job type
  • Reset Show results
Date posted
  • Reset Show results
Experience level
  • Reset Show results
Company
  • Reset Show results
Skills
  • Reset Show results
Clear All
Cancel Show Results
Active filters:
Results 12706 Jobs

Wondering why you’re not getting hired?

Take our 3-min quiz and find out!

  • See what’s holding you back
  • Know exactly what to fix
  • Get a plan to move forward
Take the Quiz!
Loading...
Loading more jobs...

No more jobs to load

No more jobs to load

Site Reliability Engineer - Director- Software Production Management & Reliability Engineering
Save Job
Morgan Stanley

Site Reliability Engineer - Director- Software Production Management & Reliability Engineering

Onsite Bengaluru, India Full Time
Posted 8 hours ago
Save Job

Job Details

Profile Description

We’re seeking someone to join our CDRR Technology team as a Site Reliability Engineer, in Cyber to help drive performance, reliability, enhanced observability and efficiency for the department’s Data Obfuscation system.

In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities.

This is Director position that oversees the production environment, ensuring the operational reliability of deployed software, and implements strategies to optimize performance and minimize downtime.

Since 1935, Morgan Stanley is known as a global leader in financial services, always evolving and innovating to better serve our clients and our communities in more than 40 countries around the world.

What you’ll do in the role:

  • Maintain System Reliability: Monitor, maintain and improve the reliability and performance of production systems, ensuring adherence to SLIs, SLOs and managing error budgets effectively
  •  Incident Management: Participate in an on-call rotation to respond to technical emergencies, outages and user escalations, providing timely diagnosis and resolution while following incident management best practices
  •  Automation & TOIL Reduction: Develop and maintain automation scripts using Python and Bash to reduce manual operational overhead, improve efficiency and eliminate repetitive tasks
  •  Observability & Monitoring: Design, implement and maintain observability solutions using tools like Prometheus/Cortex/Mimir, Loki, Tempo and Grafana, including creating and optimizing dashboards for system visibility
  •  Database Operations: Perform database administration, troubleshooting and performance optimization activities, diagnosing issues and implementing improvements to ensure optimal database health
  •  Ticket & Project Management: Manage and resolve incidents, requests and problem tickets through ServiceNow and JIRA, ensuring proper documentation and knowledge sharing
  •  Collaboration & Escalation: Serve as an operational point of escalation for technical issues, working closely with development teams, stakeholders and clients to deliver reliable solutions
  •  Data Pipeline Support: Support and troubleshoot data transfer technologies including ETL tools, Kafka, MQ and other messaging/pipeline systems to ensure seamless data flow
  • Continuous Improvement: Identify opportunities for system improvements, contribute to capacity planning, and implement proactive measures to prevent incidents and enhance system resilience

What you’ll bring to the role:

  • The successful candidate will have 4–6 years of experience in Operations and SRE activities.
  • Strong Linux troubleshooting skills
  • Hands-on experience in Python for TOIL reduction and automation (Bash scripting is a plus)
  • Good working knowledge of ServiceNow (Incidents, Requests and Problem tickets) and JIRA
  • Willingness to participate in an on-call rota to support user escalations
  • Strong team player with the ability to build effective working relationships with colleagues and stakeholders
  • Solid understanding of SRE principles including SLIs, SLOs and Error Budget management
  • Excellent oral and written communication skills
  • Experience in database administration, engineering or troubleshooting, ideally including performance optimization and the ability to diagnose, debug and suggest improvements
  • Ability to respond appropriately during technical emergencies, such as outages
  • Observability stack knowledge across Prometheus/Cortex/Mimir, Loki, Tempo and Grafana, including the ability to create and maintain Grafana dashboards
  • Experience with data transfer technologies such as ETL (e.g. Talend, Informatica), Kafka, MQ and similar messaging/pipeline technologies
  • Software engineering or data engineering experience
  • Experience of being an operational point of escalation.

WHAT YOU CAN EXPECT FROM MORGAN STANLEY:

At Morgan Stanley, we raise, manage and allocate capital for our clients – helping them reach their goals. We do it in a way that’s differentiated – and we’ve done that for 90 years.  Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren’t just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you’ll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There’s also ample opportunity to move about the business for those who show passion and grit in their work.

To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices​ into your browser.

Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

Company Details
Morgan Stanley
 New York City, NY, United States
Work at Morgan Stanley

At Morgan Stanley, we raise, manage and allocate capital for our clients – helping them reach their goals. We do it in a way that’s differentiated... Read more

Did you submit an application for the Site Reliability Engineer - Director- Software Production Management & Reliability Engineering on the Morgan Stanley website?