As an Uber SRE you will work with a team of SRE colleagues. You may be tech lead for or participant in multiple projects in developing plans, negotiate engagement details with development partners, and organize the work of your SRE team.
Join Uber Site Reliability Engineering and help us redefine what it means to be an SRE in 2018! As an Uber SRE, you will join a team of reliability engineers who partner with development teams throughout the organization with the ultimate goal of improving Uber products, features, and flow reliability.
An Uber SRE spends just as much of their time working on systems and they do writing code. You’ll be tasked with all manner of work from building operational tooling, automating operational workflows, performing architecture and design reviews, investigating system failures and complex outages, improving our monitoring infrastructure, defining service level objectives and agreements for Uber products and flows and much more.
We hire SREs at all levels.
Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability
Ensure all key services are measured, monitored and raising alerts when needed
Automation of deployment and configuration processes
Develop reliability tools and frameworks for use by all engineers
Share on-call for Uber’s most critical systems and lead incident response and no-blame postmortem analysis and review
Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
We are expert in Uber infrastructure and best practices and we help development teams using infrastructure more effectively.
We are on point for capacity planning and to help teams anticipate and prepare for growth.
What you'll need
Grit, drive and a deep feeling of ownership.
BS or MS in Computer Science or a related technical discipline. Equivalent practical experience is a reasonable substitute.
Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems and modern memory management, threads and processes, the user/kernel-space divide, etc.
A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.
Working knowledge of the TCP/IP stack, internet routing and load balancing.