SITE RELIABILITY ENGINEER

Posted 5 days ago
Main Location
Austin, TX, United States
Open jobs

OUR STACK

APPLICATION MONITORING: Instana, Cloudwatch, ELK, Prometheus, Influx, Grafana
ALERTING SYSTEM: Pagerduty
INFRASTRUCTURE AS CODE: Terraform, Cloudformation
AWS CLOUD: Lambda, EKS, ECS, EC2, DynamoDb, AuroraDb, Kinesis, SNS, SQS, Redshift, API Gateway, Cloudfront, S3, IAM
EDGE-SIDE COMPUTE / WAF: Cloudflare
CICD: GitHub Actions, Jenkins, Teamcity
SOURCE CONTROL: GitHub
PROCESS: JIRA, Confluence, GitHub

WHAT IS EXPECTED OF YOU

  • Can communicate effectively with a diverse team
  • Understanding best practices of software reliability and availability
  • Has experience creating alerts and dashboards in Cloudwatch / Grafana / other APMS
  • Has experience working with a development team to plan / build out tasks for setting SLI / SLOs for new and legacy features and can influence the application’s SLA
  • Demonstrate understanding of principles of the Incident Command System
  • Has a wide breadth of knowledge of compute, storage, message queueing and networking of AWS resources
  • Has experience writing Github Actions / Jenkins software pipelines
  • Has experience writing complex multi-resource terraform / cloudformation stacks from scratch
  • Has experience managing and deploying applications to kubernetes


QUALIFICATIONS

  • Experience with modern APM technologies, dashboarding, setting application alarms, and incident response
  • 3+ years of hands on devops / sre experience
  • 3+ years with python / node
  • 3+ years working with AWS with Infrastructure as Code
  • 3+ years of experience with writing software development pipelines
  • 2+ years of experience with kubernetes and helm charts
  • 2+ years working closely with software development teams


PREFERRED QUALIFICATIONS:

  • Experience with Instana, Cloudwatch, ELK, Prometheus, Influx, Grafana
  • Experience with working with teams to define SLIs & SLOs for new and existing services
  • Experience with Github Actions or Jenkins CI
  • Experience with Kubernetes and EKS
  • Experience with software Incident Command System (facilitating Post Mortems)
Mission
We're a community of women leveraging our connections into top companies to help underrepresented women get the roles they've always deserved. Simultaneously, we work to build truly inclusive hiring processes and environments where women can thrive and not just survive.
Are you hiring? Join our platform for diversifiying your team
SITE RELIABILITY ENGINEER
uShip, Inc.