As a Service Reliability Engineer working on UST Service Engineering group’s critical production applications and infrastructure, your mission will be to ensure services are always fast, available, scalable and engineered to withstand unparalleled demand. You will be in the thick of solving the problems of systems at scale in a way most engineers never experience. This position requires the flexibility and aptitude to zoom in to fine-grained detail, and the agility to zoom right back out and up the stack. Delve into how software performs, packets flow, and hardware and code interact, in support of managing services, steering global traffic and predicting and preventing failures.
You will manage, automate, and make data-based decisions and judgment calls which influence globally distributed applications. You will also be driving performance and reliability from software and infrastructure at massive scale -- where dealing in petabytes and gigabits and shifting by orders of magnitude is routine. You will tackle challenging, novel situations every day and work with just about every other engineering and operations team at Microsoft. You will be looked upon as an expert and advocate to fellow engineers on making design and reliability trade-offs in running large-scale services and engineering complex systems that are resilient to failure.
As a successful candidate for this role you will have strong analytical and troubleshooting skills, fluency in coding and systems design, solid communication skills and a desire to tackle the complex problems of scale which are unique. We are particularly interested in software engineers familiar with aspects of running web services at scale -- depth in networking technologies, Caching Solutions, Data/Databases are strong pluses.
- Manage availability, latency, scalability and efficiency of services by engineering reliability into software and systems.
- Focus on operability of the service including security, privacy, resiliency, Business Continuity & disaster recovery.
- Respond to and resolve emergent service problems and write software/build automation to prevent problem recurrence.
- Participate in service capacity planning and demand forecasting, software performance analysis, Machine Learning and system tuning for resource optimization.
- Contribute and Implement base infrastructure. Create, review, and influence ongoing design, architecture, standards, and methods for operating services and systems.
- Maintain an unwavering focus on Quality of Service.
- Execute with high accountability to schedule and quality Problem resolution - Timely resolution of critical production systems issues using technical and problem-solving expertise.
- Analyze system trending - Provide systems statistical trending and analysis expertise including trending in system capacity and threshold testing and monitoring, performance characteristics Evaluate systems and technology - Assess new systems designs and technical strategies Evaluate and Improve client experience interacting with services, Network, latency, reliability and availability
- Create Measures, tools & reports on client & service Performance.
- Drive design and code changes to drive improvements. •Contribute on driving service monitoring requirements with feature design.
- Perform Monitoring gap analysis post implementation and create new monitors to resolve gaps.
- The position requires participation in a 24x7 on-call rotation Participate actively in code reviews, bug/issue triage with the feature teams, and support well informed decisions towards business and engineering goals.
- Closely collaborate with partner teams when engineering & business dependencies exist.
- BS degree in Computer Science or related technical field or equivalent practical experience.
- Experience in data structures, algorithms and complexity analysis.
- Experience with Azure or cloud services.
- Candidates demonstrate that they have a deep understanding of the technical concepts have previously applied of at least two core technologies (SQL, IIS, SAN, Windows 2008/2012, IP Networks, etc.) plus the ability to quickly incorporate information about new technologies.
- Manage online services in Windows & SQL Azure cloud computing.
- Strong fundamentals on TCP/IP concepts, load balancing, GTM, ACL, routing, Data/Database Engineering, design and manageability.
- IP networking, network analysis and performance and application issues using standard tools.
- Experience creating, developing, delivering, deploying, maintaining large scale cloud services.
- 6 years of relevant work experience, including experience in a high-volume or critical production service environment.
- 2+ years of managing services in cloud.
- 3 years of relevant work experience in any of the following: C#, VB.NET, Powershell, or other web related technologies.
- Expertise in analyzing and troubleshooting large-scale distributed systems.
- Ability to handle periodic on-call duty as well as out-of-band requests.
- Collaboration - Involvement with development projects from the envisioning phase through release and support. Your contributions will help ensure that solutions perform, scale, and are highly available. With an eye on improving operational practices and procedures.
- 6+ years of professional work experience in managing large scale systems with MS server products required including and not limited to; Windows 2012, SQL 2012/2014, IIS, and OWin.
Experience with Modern Deployment Methodologies such as Micro Deployments, Continuous Deployment, and Continuous Integration Working knowledge of performance monitoring, remote administration, client/server architecture, Web application management, internet technologies and network architecture MCSE/ MCSD, CCNA certification a plus.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.