Zendesk is going through a positive transformation to make our products more resilient and reliable. In order to do that, we need to make sure we’re accurately monitoring and observing the behavior of our services, infrastructure, and overall ecosystem. That’s where you come in! We need you to standardize and implement how observability and monitoring is done at Zendesk. Then work with teams to adopt these standards while baking them into tooling for plug and play goodness. The SRE organization is relatively new at Zendesk so it’s an exciting time to join and put your fingerprint on reliability engineering at Zendesk. You will have huge impact in the success of Zendesk by helping raise everyone’s observability and monitoring bar through self-service tooling and bootstrapping teams to empower successful ownership of their services.
What you'll get to do everyday:
Design, develop, evolve, and be responsible for the observability vision and framework at Zendesk
Provide and institute proven practices around observability and instrumentation
Build necessary tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of observability
Work with teams to identify their SLIs and provide measurements against their SLOs
Provide guidance to engineering teams on how to leverage observability and proactively resolve issues before they become incidents
Build, deploy, and provide production support for any services you own
Analyze the shortcomings of existing systems and propose alternatives
Chip in on other reliability initiatives like production readiness, code delivery, architectural and software design discussions
Characteristics and Proclivities:
Curiosity for the unknown and not stopping until you have a solid understanding
Passionate about automating manual tasks and processes
Do you equally value empathy, communication, and technical skills?
Do you care deeply about observability and know how and when to make trade-offs?
Are you laser focused on the needs of internal and external customers?
Do you feel great satisfaction from helping and empowering others?
7+ years experience in a SWE and/or SRE role
Experience with observability of distributed systems running in Kubernetes on AWS
Proficient in Ruby, Scala, Python, and/or Go
Experience architecting, improving, and operating large scale distributed systems
Experience building for and supporting a polyglot programming and datastore environment
A proven track record of driving cross-functional initiatives to completion
Can debug complex problems across the whole stack
Experience with Datadog, New Relic, PagerDuty, and ELK
Successfully worked with teams distributed around the globe
Deep understanding of Linux system internals and analytical tools
Zendesk builds software for better customer relationships. It empowers organizations to improve customer engagement and better understand their customers. Zendesk products are easy to use and implement. They give organizations the flexibility to move quickly, focus on innovation, and scale with their growth. Based in San Francisco, Zendesk has operations in the United States, Europe, Asia, Australia, and South America. Learn more at www.zendesk.com.
Interested in knowing what we do in the community? Check out the Zendesk Neighbor Foundation to learn more about how we engage with, and provide support to, our local communities.
Individuals seeking employment at Zendesk are considered without regards to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, or sexual orientation.