PowerToFly
Recent searches
  • Events
  • Companies
  • Resources
  • Log in
    Don’t have an account? Sign up
Filters
Clear All
Advanced filters
Job type
  • Reset Show results
Date posted
  • Reset Show results
Experience level
  • Reset Show results
Company
  • Reset Show results
Skills
  • Reset Show results
Clear All
Cancel Show Results
Active filters:
Results 12187 Jobs

Wondering why you’re not getting hired?

Take our 3-min quiz and find out!

  • See what’s holding you back
  • Know exactly what to fix
  • Get a plan to move forward
Take the Quiz!
Loading...
Loading more jobs...

No more jobs to load

No more jobs to load

Infrastructure Production Management & Reliability Engineering III - AVP / Director P3 - ETS
Save Job
Morgan Stanley

Infrastructure Production Management & Reliability Engineering III - AVP / Director P3 - ETS

Onsite Hong Kong, Hong Kong Full Time
Posted 16 hours ago
Save Job

Job Details

About Morgan Stanley

Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services. The Firm's employees serve clients worldwide including corporations, governments, and individuals from more than 1,200 offices in 43 countries.
 

As a market leader, the talent and passion of our people is critical to our success. Together, we share a common set of values rooted in integrity, excellence, and strong team ethic. Morgan Stanley can provide a superior foundation for building a professional career – a place for people to learn, to achieve and grow. A philosophy that balances personal lifestyles, perspectives and needs is an important part of our culture.

Overview
Join Morgan Stanley’s Application Services Infrastructure team to keep a set of business-critical infrastructure applications reliable for technologists across the firm. Our platforms help teams schedule, coordinate, and monitor their production workloads.

You’ll combine deep Linux troubleshooting with automation and reliability engineering: improving monitoring, reducing toil, leading upgrades, and driving root-cause fixes that prevent repeat incidents.

What you’ll do
- Own production reliability for multiple infrastructure applications: incident response, triage, and sustained follow-through to resolution.
- Drive stability work: improve alerting quality, monitoring coverage, and operational tooling to reduce noise and speed recovery.
- Lead or execute production changes (upgrades, hygiene fixes, reconfiguration) with strong change-management and rollback planning.
- Perform in-depth RCAs and prevent recurrence of incidents and escalations through long-term fixes, automation, and better runbooks

- Build self-service workflows and high-quality documentation to improve user experience and reduce time-to-production.
- Partner with product engineers and infrastructure teams to identify systemic issues and deliver cross-team solutions.

On-call & schedule
- After onboarding, you’ll join a rotating on-call roster with periodic weekend coverage (~1 weekend/month).
- L3 support focuses on high-impact incidents where documentation is incomplete—success requires calm, structured troubleshooting in distributed systems.
- Occasional off-hours work may be needed for planned changes and incident follow-up (we aim to minimize this through automation and process).

Required experience
- At least 7 years of experience in production support / reliability experience for applications on Linux/UNIX.
- Strong command-line troubleshooting skills: logs, processes, networking, and dependency health in distributed systems.
- Ability to write production-ready automation in bash/shell plus one language (Python preferred; Go/Ruby/Perl/C/others welcome).
- Strong written communication for technical documentation and incident/RCA write-ups.
- Working understanding of distributed architecture (load balancers, app servers, databases, messaging).

 - AI-assisted development and operational automation.

Preferred experience
- Cloud-native deployment/support and/or containers (Docker/podman).
- Observability tooling (Grafana, Splunk, or similar), log forwarding/agents, and alert tuning.
- Linux administration and performance troubleshooting.
- Any database experience (SQL/NoSQL).
- Experience with workflow/scheduling platforms (Autosys, Apache Airflow) or coordination systems (Apache Zookeeper).

Work model
Hybrid: 3 days/week in-office

WHAT YOU CAN EXPECT FROM MORGAN STANLEY:

At Morgan Stanley, we raise, manage and allocate capital for our clients – helping them reach their goals. We do it in a way that’s differentiated – and we’ve done that for 90 years.  Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren’t just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you’ll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There’s also ample opportunity to move about the business for those who show passion and grit in their work.

To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices​ into your browser.

Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

Company Details
Morgan Stanley
 New York City, NY, United States
Work at Morgan Stanley

At Morgan Stanley, we raise, manage and allocate capital for our clients – helping them reach their goals. We do it in a way that’s differentiated... Read more

Did you submit an application for the Infrastructure Production Management & Reliability Engineering III - AVP / Director P3 - ETS on the Morgan Stanley website?