Job Type
Job Details
Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services. We advise, originate, trade, manage and distribute capital for governments, institutions, and individuals. As a market leader, the talent and passion of our people is critical to our success. Together, we share a common set of values rooted in integrity, excellence, and strong team ethic. We provide you a superior foundation for building a professional career where you can learn, achieve, and grow.
Technology is the key differentiator that ensures that we manage our global businesses and serve clients on a market-leading platform that is resilient, safe, efficient, smart, fast, and flexible. The Technology division partners with our business units and leading technology companies to redefine how we do business in ever more global and dynamic financial markets.
Our sizeable investment in technology results in leading-edge tools, software, and systems. Our insights, applications, and infrastructure give a competitive edge to clients’ businesses and to our own.
The Core Services L3 operation team is part of the Enterprise Computing Data Services Organization in Morgan Stanley. It is under the Application Infrastructure Service fleet, responsible of the global enterprise level infrastructure built on top of Redhat Openshift, Ansible Automation Platform, Terraform, Apache Zookeeper etc. for application management, orchestration, and infrastructure as code.
This position is senior operation manager for Hongkong with SRE (Site Reliability Engineering) responsibility, managing and improving the global middleware infrastructure.
The successful candidate will be the incident manager and escalation manager of the global production infrastructure during Asia time zone. The person will also lead run-the-bank type of projects such as data center migration, infrastructure upgrade and tooling release. In addition, the person would also participate at least one squad as SRE, following Agile practice and contributing to the infra modernization and automation.
Required Skills
- 8+ years of overall enterprise level IT experience.
- Strong incident management skills with proper understanding of ITIL principles
- Strong domain expertise of at least one product of Ansible, Terraform, Kubernetes/OpenShift, or similar products for application management, orchestration or infra as code.
- Strong shell scripting and python programming skills for SRE related activities
- Advanced Linux / Unix skills
- Experience on using Splunk OR Grafana/Prometheus/Loki stack
- General understanding on Veritas Cluster Service, Load Balancers, and VMWare.
- Knowledge on Agile methodologies
- Effective oral and written communication skills, and interpersonal skills to work well in a team environment required.
- Strong organizational and coordination skills with the ability to manage multiple tasks and high-pressure situations for outage handling, management, or resolution.
- Be available for weekend work.
Desired Skills
- Experience in application support, code release and liaison with development teams highly desired.
- Knowledge on Dockers, Kubernetes/OpenShift highly desired.
- Experience in development tool chain such as git, bitbucket and CI/CD tools preferred.
- Experience on relational databases and webservers / application servers preferred
For over 88 years, Morgan Stanley has combined old school wisdom with a passion for what's possible. Doing so enables us to provide clients with... Read more