Posted 2 months ago by

Dev/Ops Sr. Engineer - Monitoring & Automation (Phoenix, Arizona; St. Petersburg, Florida)

United States
Employment: Full Time Experience: Senior

If you are interested in investing your time and energy into creating innovations that make a difference for a global IT services organization, then join the Enterprise Monitoring, Tooling and Engineering team at American Express. Be a part of the team responsible for introducing and supporting technology that improves the availability, performance and efficiency of American Express’ IT operations.

EMTE seeks a Senior Infrastructure Engineer with the ideas, knowledge, and strengths to help us deliver a world-class monitoring platform. Using Agile methods and Scrum/Kanban processes, this individual will be responsible for EMTE’s efforts to raise the bar for operational excellence, performance, availability and automation. This Senior Infrastructure Engineer will align all designs with American Express’ architectural enterprise standards and promote the adoption of monitoring/automation best practices. Success for this individual’s performance and outcomes will be measured, in part, on the engineer’s ability to:

  • design, produce, support and continuously improve EMTE’s monitoring tools
  • increase the operational stability and efficiency of EMTE’s monitoring platforms
  • create greater visibility of AET performance and availability
  • lead and collaborate with team members, technology partners and other stakeholders to create innovative solutions that achieve personal goals and those set by organizational leaders and the team.
As a Senior Infrastructure Dev/Ops Engineer you will:
  • Lead team workgroup(s) through efforts to maintain and improve upon platform availability
  • Develop, implement and support efforts to “Monitor the Monitor” – creating greater visibility into the system and application health of EMTE monitoring tools; improving stability, alert notifications and related KPIs for MTTx
  • Manage/Perform administration for EMTE tools/platforms (e.g., APM, Enterprise Logging as a Service, system/node monitoring, Event Correlation, etc.)
  • Monitor environment and computing resources for reporting and capacity planning.
  • Help evaluate changes/updates to the EMTE monitoring tools to determine if they could impact availability of production systems and coordinate with all appropriate stakeholders as needed
  • Assist with the administration/support of other EMTE platforms as necessary
  • Be available to provide on-call support for monitoring and automation tools during business hours, nights and weekends
  • Conduct performance analysis of EMTE processes and workload to identify opportunities for greater efficiency
  • Research and evaluate capabilities available within existing tools or recommend new tooling required to realize greater efficiencies, performance and availability gained through automation
  • Promote automation solutions for operations within EMTE and for stakeholders
  • Contribute to the implementation of an automation framework designed to increase greater operational efficiency and stability of American Express’ IT Operations processes, tools and infrastructure
  • Develop, document and implement enterprise standards and procedures for monitoring tools and processes
  • Work closely, at a deep technical level, with engineering teams to ensure solution designs are consistent with American Express Technology’s architectural vision, platform/product road maps, enterprise standards, guidelines and principles
  • Collaborate with delivery teams to build IT strategies in line with company and platform standards
  • Ensure compliance with security standards, and assist in audit preparations.
  • Adopt DevOps methods in support of monitoring and automation tools/services
  • Help bridge the gap between application development and infrastructure teams.
  • Troubleshoot issues that span hardware, software, applications and network services.
  • Follow Incident/Problem/Change Management, SOX and PCI processes
  • Function as an active member of an Agile team, consistently contributing to the team and its Agile practices (tools, common components, and documentation) and Scrum processes
  • Perform all activities in a timely manner, as required, to contribute toward Enterprise-level compliance of internal/external processes, standards and regulatory controls.
Qualifications:
  • 6-8 years of experience with systems analysis/programming, incorporating: design methodology, Infrastructure operations support or engineering (e.g., Network, Server, Application, Database)
  • Hands-on experience with a variety of software languages, operating systems, or network protocols
  • 3+ years experience managing team/workgroup activities
  • Bachelor’s Degree or equivalent experience in related field required
  • Prior experience in DevOps or DevOps-like environment (Practices that emphasize the collaboration and communication of both software developers and operations engineers)
  • Practical application using Agile or other rapid application development methods
  • Self-motivated leader who can effectively collaborate in team and cross-team settings. Ability to persuade and influence without direct control.
  • Able to prioritize/manage tasks and supporting team involved across multiple work streams
  • Strong analytical, logical reasoning and problem solving skills
  • Strong written and verbal communication skills, with the ability to influence cross-functional teams, business and/or vendor partners, and technology leaders
  • Able to develop/make presentations, facilitate discussions and provide technical demonstrations in 1:1, small group and large group settings.
Preferred Experience:
  • Ability to read and write in at least one scripting language (Perl, PowerShell, Bash etc.)
  • Prior experience with at least one Version Control System (Git, Subversion, CVS etc)
  • Working knowledge of CI/CD tools (e.g., Jenkins)
  • Working knowledge of ServiceNow
  • Expertise Supporting Unix/Linux Systems (RHEL 6/7)
  • Expertise Supporting J2EE Applications (JBoss, Weblogic, Websphere, etc)
  • Expertise with Workload Automation distributed systems
  • Expertise and administration with Ansible/Tower, Puppet and/or Chef
  • Prior experience using/administering Open Source or Commercial Off-the-Shelf monitoring tools used for log monitoring, time series data, Application Performance Management, infrastructure/node monitoring or Event Correlation (e.g., Splunk/Elastic, AppDynamics, Dynatrace, ICINGA, Tivoli, BMC Patrol,
  • Prior experience supporting Network Infrastructure (TCP/IP; Layer2/Layer3)
  • Prior experience or understanding of Data Center operations/methodologies
  • Prior experience Enterprise SOA Environment
  • Prior experience with Cloud Computing Environments (EC2, Openstack, etc)
  • Working knowledge of Application Development workflow and Agile Methods
  • Experience working with Scrum or Kanban-related tools and concepts (e.g., Jira, Rally, Epics, Stories, estimating story points, etc.)
  • Knowledge of SOX, PCI and other regulatory standards helpful

The PowerToFly Mission

We're passionate about connecting highly skilled women with leading companies committed to diversity and inclusion.

Are you looking for your dream job? In Office. Flexible. Remote.

Join our Movement

Are you hiring? Join our platform for diversifying your team

Post a job