Posted 8 hours ago

We are looking for an experienced and enthusiastic Senior Site Reliability Engineer to join our DevOps team working to secure and maintain cloud-deployed applications and platform services.

Key Responsibilities

You will be responsible for the end-to-end implementations of new monitoring systems, as well as the application integrations and processes to enable Kinesso to deliver a best-in-class system of customer-facing, globally deployed applications, data management tooling and web services.

At Kinesso, we’re all about reliability, availability, and uptime. We’re looking for engineers that are equally adept at building software and services as they are administering systems on the command line. You’ll need to be comfortable building things from scratch and be someone that believes that anything worth doing twice is worthy of automation development. You’ll come to the table with application and service monitoring expertise, a strong understanding of application security, and be comfortable working in fast-paced highly collaborative environment.

Protecting our customers’ data is Job 1. The ideal candidates for the role will have demonstrated experience in designing and maintaining solutions that operate on a least-privilege, minimum-risk design paradigm while retaining performance, reliability, and flexibility for our customers. Our view of site reliability extends beyond monitoring if processes are up or down but includes application performance and application security. Monitoring for vulnerabilities and potential threats to our services will be part of the job. The ability to provide dashboards, reports, and mechanisms to escalate on issues will be key to a candidate's success.

Desired Skills & Experience

  • Have broad AWS and GCP experience, specifically monitoring and alerting tooling such as Cloudwatch, SQS, etc.
  • Expert experience with hosted and cloud-based monitoring tools and platforms (PagerDuty, Datadog, Sumologic, Nagios, New Relic, Splunk, SolarWinds, etc.)
  • Deep understanding of APMs and application performance monitoring
  • Experience running SAST, SCA, DAST tooling and infrastructures
  • Experience building SOPs and administering incident response
  • Ability to build relevant Dashboards and Reports for different internal audiences
  • Significant experience in managing Linux based infrastructure
  • Experience with operational management of Windows machines
  • Experience at least in two or more scripting languages
  • Experience with SQL, MySQL, Postgress databases
  • Experience with one or more of the following HashiCorp tools: Terraform, Vault, Consul
  • Experience with Docker-based containerization platforms/frameworks. AWS ECS and/or EKS a big plus
  • Knowledge of Java/JVM and .NET Framework based languages. Development experience is a huge plus
  • Experience in cloud, datacenter, network and data segmentation management practices in a global, multitenant architecture
  • Experience in designing and deploying a variety of data sovereignty-driven (e.g. GDPR, etc.) designs
  • Strong critical thinking and problem-solving skills, and a sense of ownership and pride in your performance and its impact on company's success
  • Excellent team player, with strong interpersonal and communication skills. Being comfortable working closely with multiple groups and departments.
  • 5+ years working in production level environments, preferably in DevOps, SRE, or IT operations
We're connecting diverse talent to big career moves. Meeting people who boost your career is hard - yet networking is key to growth and economic empowerment. We’re here to support you - within your current workplace or somewhere new. Upskill, join daily virtual events, apply to roles (it’s free!).
Are you hiring? Join our platform for diversifiying your team
Site Reliability Engineer