Platform Operations Engineer

NY, United States
Experience: Mid-Level, Senior

Core Responsibilities

  • Manage all Digital Infrastructure related backend systems/services currently residing on AWS Cloud including users access, network connectivity, Linux/Windows systems, databases, and applications management.
  • Deploy updates and patches to servers as well as connected client systems in off-hours maintenance windows.
  • Identify, troubleshoot and resolve both server and client issues by analyzing logs from all digital infrastructure components.
  • Set up and continue to improve monitoring/alerting matrices for all supported platforms.
  • Proactively review key operating matrices and status to ensure all systems are running under recommended operational conditions.
  • Participate in designing and implementing of mechanisms for redundancy, failover, and disaster recovery.
  • Develop tools and scripts to automate routine tasks.
  • Collaborate with NOC, DevOps, and Engineering teams to harden, streamline, and document operating processes.
  • Work closely with Head of Digital Infrastructure to improve operability, supportability, usability, and visibility of the digital infrastructure.
  • Assist in continuous improvement of operational processes for better utilization of underlying cloud resources.
Requirements
  • At least 5+ years of direct working experience in operating production digital infrastructure with strong scripting and system administration skills for both Linux and Windows operating systems.
  • At least 3 years AWS administration experience including but not limited to OpsWorks, VPC, EC2/ECS, S3, RDS, IAM, ES and EMR services
  • Working knowledge of advanced message queuing and extensible messaging and presence protocols
  • Working knowledge of modern system operating tools for monitoring and centralized logging.
  • Experience with automation and configuration management using Chef and Ansible
  • Ability to use a variety of open source technologies and integrating them with cloud services
  • Experience in managing PostgreSQL, MySQL, MS SQL and NoSQL clusters
  • Working knowledge for securing data and ensuring operating redundancy in cloud environment
  • Ability to evaluate system and application logs, error messages, stack traces to quickly identify and solve production problems
  • Understanding of best practice and data center operations in an always-up, always-available setup
  • Ability to create and maintain up to date infrastructure documentation including systems, networks, databases, and their interactions
  • Ability to adhere to established operations procedures and policies
  • Ability to create clear steps by steps knowledge base documents for NOC to follow and resolve known issues
  • Participate in 24x7 on call rotations
  • Bachelors degree in relevant fields
Mission

We’re passionate about connecting highly skilled women with leading companies commited to diversity and inclusion

Are you looking for your dream job? In Office. Flexible. Remote.

Join our Movement

Are you hiring? Join our platform for diversifying your team

Post a job