Site Reliability Engineer II

Main Location
Redmond, WA, United States

Site Reliability Engineer II, Secure Admin Services


The mission of Microsoft Digital is to power, protect, and transform Microsoft as the voice of our digital transition in the market. ​​​​​​As part of Microsoft’s Cloud + AI Group, we are responsible for building, managing, and securing the platform, products, processes, and services that powers Microsoft. We build, maintain, and implement a cloud-first approach to our technology and experiences, from custom-built business solutions developing our campus of the future and our productivity and collaboration experiences like Teams and SharePoint, to horizontal 3rd party solutions like SAP and Adobe. As a steward of Microsoft and our customer’s data, a core function of Microsoft Digital is ensuring the security of every aspect of the business. Microsoft Digital is responsible for company-wide information security and compliance, with a strategic focus on information protection, assessment, awareness, governance, and enterprise business continuity. Microsoft Digital’s charter is also to influence and work alongside engineers across the company and with strategic partners to build and grow their cloud products and services. As customer zero, we deploy these services inside Microsoft and then share best practices with enterprise customers at scale across the globe. We have exciting opportunities for you to innovate, influence, transform, inspire and grow within our organization and we encourage you to apply to learn more! 


Microsoft has been a leading company in computing for decades. We are a global company, relied on by companies, governments, utilities, stores, schools, universities and co-operatives to deliver the things they need to work, every day. In order to make this work, we need to make it reliable. In order to make it reliable, we need you -- someone who already is, or is interested in becoming, a Site Reliability Engineer (also known as SRE), within our SAS Site Reliability Engineering team.


The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer, you will identify and deliver service improvements using your expertise in services engineering, systems, networks and software know-how, reliability and dependency analysis and scalable system design principles. Strong collaboration skills will be required to work closely with other engineering teams, service owners and support teams to ensure services/systems are highly stable and performant, meeting the expectations of our user base across the company.


Site Reliability Engineering is a hybrid role, comparatively rare in industry but crucially important to how things work behind the scenes today. SREs are people who take engineering-based approaches to solving operations problems; we like infrastructure, we like seeing how the big complicated thing works, and most importantly, we gain great satisfaction from making it better. 


Our Site Reliability engineers are persistent problem solvers, always focused on mitigating issues and owning a problem until resolution is in place. To accomplish this, they work in close collaboration with various engineering teams. They are also involved in automation, developing tools to support DevOps model, and analyzing vast amounts of data to find trends and suggest improvements. Creativity and data-driven decision making is heavily valued in this emerging role.


Site Reliability Engineers build, monitor, and maintain the systems and infrastructure that ensure our customers can quickly access their data and run workloads whenever they need to. We identify service problems and areas for improvement, and we help implement solutions. Our work is key to the security and credibility of many of the Microsoft services and Microsoft’s credibility. Secure Admin Services provide access to Microsoft’s entire infrastructure and ecosystem in a secure manner.




Key responsibilities: 

  • Provide technical engineering for a cross-functional, highly visible, operations team supporting the secure access services platform for Microsoft’s corporate network.
  • Identify opportunities and drive the implementation of automation to improve service health, manageability, reliability and telemetry.
  • Own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the process
  • Ability to read, write, configure, design, and script end-to-end service telemetry, alerting and self-healing capabilities for platforms.
  • Authoring functional and technical documentation.
  • Communicate on a deeply technical level with product engineering, project management and operations teams to improve and optimize products, improve infrastructure, and evolve services.
  • Remain current on new technologies, methods and procedures including, but not limited to, coding practices such as Test Driven Development, Continuous Integration, and Continuous Deployment.

Required Qualifications:  

  • BA/BS in Computer Science, Computer Engineering or related technical discipline, or in place of 4-year degree, an equivalent industry internship or industry software engineering experience.
  • Experience with one or more general purpose programming languages including but not limited to: C/C++, C#, Python, PowerShell, JavaScript.
  • Full-stack troubleshooting skills across network, application, hardware, management fabric, and distributed services layers.
  • The successful candidate must be a U.S. Citizen.  

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:    


- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.


-Citizenship Verification: This position requires verification of US Citizenship to meet federal government security requirements.


Preferred Qualifications:

  • 2+ years of scripting and programming experience (preferably .NET, PowerShell, Python, C#).
  • Experience with the Microsoft cloud and/or stack including O365, Azure, Windows or other Microsoft software/services.
  • Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns.
  • Demonstrated ability to debug, fix, and optimize code.
  • Excellent troubleshooting skills are a must to be successful in this role.
  • Out of the box, quick and agile thinking to adapt to fast pace and changing environment.
  • Deep knowledge of system design & architecture, and running of complex, large scale online services.
  • Demonstrated technical experience with site reliability engineering or software development and operations.
  • Experience building distributed cloud-based software services.
  • Fast learner, introspective.
  • Ability to contribute to multiple projects/demands simultaneously.

The ideal candidate will have experience in a team environment, experience running and deploying cloud scale services and platforms, technical depth in security of cloud platforms, safe deployment paradigms at cloud scale, agile development practices, and experience in designing & tuning monitoring/telemetry.


Preferred locations:, Atlanta, Austin, Redmond, Reston + Remote in the U.S. 





Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.  We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.


Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

We're a community of women leveraging our connections into top companies to help underrepresented women get the roles they've always deserved. Simultaneously, we work to build truly inclusive hiring processes and environments where women can thrive and not just survive.
Are you hiring? Join our platform for diversifiying your team
Site Reliability Engineer II
Microsoft Corporation