Help us maintain the quality of jobs posted on PowerToFly. Let us know if this job is closed.
Job Type
Full Time
Job Details
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job CategorySoftware Engineering Job Details About Salesforce We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place. Salesforce’s Centralized Incident Response Team is an expert group of technical incident management engineers whose primary responsibility is ensuring the rapid mitigation of any severe events undermining the success of Salesforce’s customers and partners. We prioritize availability above all else and participate in a closed loop wherein insights are continuously evaluated as opportunities for improving how we prepare for, respond to, and learn from incidents. In service of these goals, we operate a centralized capability with maximum efficiency and are actively searching for strong engineers to join our Salesforce team!As an Incident Management Engineer at Salesforce, you will lead technical response efforts during the most critical impacts on customer availability. Your responsibilities will include incident command, investigation response, and layers of triage and diagnostics. This spans an array of domains and infrastructure and involves partnering with the entire technology organization to drive action and rapid mitigation. An Incident Management Engineer is passionate about resolving technical problems as sophisticated as they are unique. This engineer possesses an iterative approach and leaves every response ready to analyze the entire effort, clearly communicating any opportunities, including with their own performance. Technical competence is desired for this role, but the position requires that individuals demonstrate mastery of streamlined, persistent, and controlled incident command.Key Responsibilities:
- Provide experienced execution of the incident command process, including running and handling high-severity incident bridges and driving transparent communication that promotes maximum levels of internal and external customer satisfaction
- Collaborate with an array of technical stakeholders and executives to drive resolution during incidents and improve overall response for future incidents and technical escalations
- Utilize top-notch troubleshooting techniques to identify, organize, and advocate for novel solutions to remediate customer impact on complex interconnected systems
- Participate in a closed-loop post-incident learning process driving insights and meaningful action
- Iterative improvements in response through consistent drills, tabletops, and game-day exercises
- Push the boundaries of innovation in incident management to deliver best-in-class incident response
- 5+ years of technical experience in a large enterprise or SaaS environment, handling highly complex issues at scale
- 3+ years managing, coordinating, and ensuring resolution of major incidents
- 3+ years Site Reliability Engineering or equivalent Production Engineering function
- Customer-centric attitude with a focus on providing best-in-class incident response for customers and stakeholders
- Passion for consistently responding to and leading complex incidents in a 24x7x365 environment utilizing a globalized follow-the-sun model.
- Enthusiasm for incident management theory and frameworks (ICS/NIMS etc.)
- Unparalleled troubleshooting and problem-solving skills
- Expertise in managing enterprise-level escalations with a high degree of executive visibility and scrutiny including managing, prioritizing, and delegating multiple escalations and workstreams at once
- Demonstrate strong leadership skills during periods of significant business impact, remaining calm and professional during high-pressure situations
- A strong desire to drive customer success with partner teams and management on high-profile issues critical to the long-term success of the business
- Outstanding verbal and written communication skills with the ability to convey information in a meaningful way to both engineers and executive-level management, during and outside of incidents
- Adaptable to a wide variety of technologies and capable of incident response and troubleshooting activities in complex interconnected environments
- A strong background in cloud architectures, understanding of fundamental network technologies like DNS, Load Balancing, SSL, TCP/IP, SQL, HTTP
- Excellent project management skills, including demonstrated ability to manage projects across teams where influencing skills are required
- 5+ years of broad engineering experience in highly complex distributed systems
- Experience with Amazon Web Services (AWS), containerized applications, microservices such as Kubernetes,log parsing/analysis such as Splunk an Oracle SQL.
- Prior experience creating/utilizing dashboards such as Grafana
- Experience taking part in blameless retrospectives, learning from incidents, and conducting post-incident investigations, including incident analysis as well as performance evaluations of responders
- Knowledge of the Salesforce platform
About the Company
Salesforce
San Francisco, CA, United States
WHO WE ARE: We’re Salesforce, the Customer Company, inspiring the future of business with AI+Data+CRM. Leading with our core values, we help... Read more