Watch this video to learn more about Cummins Inc.
Job Details
DESCRIPTION
Job Summary:
Provides development, execution and continuous improvement of monitoring/observability solutions working with the infrastructure and application teams; define and build observability standards against infrastructure and application platforms; create detailed design and reference documents for monitoring tools.
Key Responsibilities:
Works autonomously, possessing professional communication, and technical writing skills. Implement, maintain and consult on the observability and monitoring framework that supports the needs of multiple internal stakeholders; manage the event and operations Escalation Management policies. Build a practice of performance and tracing using Dynatrace APM platform along with hands-on experience with Splunk, CloudWatch, DynaTrace, Service Now and Solarwinds. Partner with our application development and technical operations teams to define and implement observability standards and practices. Engineer solutions and implement standards for IT Ops and Application Performance Monitoring infrastructure, agent deployments, including optimizations and tunings, along with maintaining and troubleshooting the current monitoring infrastructure; Continue evolving monitoring tooling toward a standards-based self-service automated platform using infrastructure as code tools like Terraform, Python or other tools. Build, implement, and tune dashboards, scripts, and alerts to achieve observability of services and infrastructure. Facilitate the onboarding of services into monitoring platforms and adopt monitoring tools for usage on both on-premise and cloud platforms. Provide guidance for application/operational teams to setup policies, alerts, dashboards, and custom configuration to get full visibility into their entire environment. Work closely with leaders and team members to compose stories, design features, and prioritize tasks; Identify monitoring gaps/risks and establish prioritized mitigation plans. Perform correlation analysis leveraging big data to drive proactive and predictive monitoring capabilities improving Mean-Time-To-Resolution (MTTR), through visualization of data and ability to solve complex problems; Contribute to define and achieve Service Level Indicator, Service Level Objective and Service Level Agreement objectives. Maintain strong relationships to deliver business value using relevant Business Relationship Management practices.
RESPONSIBILITIES
Competencies:
Event Management - Manages events through their life cycle from detection and understanding to appropriate resolution using the required processes and tools to continuously improve services.
Availability Management - Ensures the availability of IT services complies with service level agreements using the required processes and tools in order to balance customer and budgetary requirements.
Agile Development - Uses API-First Development where requirements and solutions evolve through the collaborative effort of self-organizing and cross-functional teams and their customer(s)/end user(s) to construct high-quality, well designed technical solutions; understands and includes the Internet of Things (IoT), the Digital Mesh, and Hyper Connectivity as inputs to API-First Development so solutions are more adaptable to future trends in Agile development.
Cloud Computing Services - Leverages on-demand network delivery of compute power, storage, database, applications and other information technology (IT) resources using Cummins IT standards, tools and methodologies to rapidly provision and release solutions.
Data Analytics - Discovers, interprets and communicates qualitative and quantitative data; determines conclusions relying on knowledge of business or functional frameworks; simultaneously applies statistics, data validity, data visualization, and problem solving approaches to effectively extract meaningful patterns and business insights; presents conclusions and outcomes that enable data driven business decisions.
Incident Management - Maintains reported issues or requests assigned via the Incident Management system to log actions taken and track trends.
IT Operational Support - Executes ongoing activities and procedures required to manage and maintain IT services to deliver agreed service levels.
Performance Tuning - Conceptualizes, analyzes and solves application, database and hardware problems using industry standards and tools, version control, and build and test automation to meet business, technical, security, governance and compliance requirements.
Service Continuity Management - Ensures IT services (hardware, networks, etc.) are available using the required processes and tools in order to meet the agreed needs, requirements and timescales of the business.
Communicates effectively - Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences.
Manages complexity - Making sense of complex, high quantity, and sometimes contradictory information to effectively solve problems.
Values differences - Recognizing the value that different perspectives and cultures bring to an organization.
Business Need Definition - Defines the business outcome that the proposed work will provide using the Business Analysis Toolkit (modeling the five aspects and creating use cases) to justify investment of resources (people, time, finances).
Education, Licenses, Certifications:
College, university, or equivalent degree in computer science, engineering or related subject, or relevant experience required. This position may require licensing for compliance with export controls or sanctions regulations.
Experience:
5 - 7 years of experience with operating system, observability and monitoring tools is highly preferred.QUALIFICATIONS
This role requires playing a primary role in making changes to working procedures, schedules, or methods. This role also requires need to follow existing SOPs and create new SOPs and align templated solutions to new applications and align delivery to application team requirements.
This role requires creating new processes, and automations. They will need to roll out new and existing templated solutions to both operations and product teams.
This role will communicate regularly with Product managers, Cloud Platform team, technical leads, developers, Operations, Cloud Engineers, scrum masters, architecture, Cyber and external software/tool vendors. The nature of communication is to understand requirements, suggest alternate approaches, if applicable and influence solution design. As there a different way to solve technical problem, this role needs to present pros/cons of different options and help select the best possible option for Cummins. This role requires lot of collaboration with different teams to create self-service solution. 20-30% time could be spent on communication.
Job Systems/Information Technology
Organization Cummins Inc.
Role Category On-site with Flexibility
Job Type Exempt - Experienced
ReqID 2423050
Relocation Package No
100% On-Site No