DxSherpa - D

ITSM Best Practices: Problem Management

ITSM Best Practices: Problem Management

Many organizations view problem management as an investigative effort that results in an RCA report for high-profile incidents, while lower-priority instances are simply closed without addressing the root causes. When the causes of incidents are not resolved, incident volume rises as new incidents pile on top of incidents caused by previous issues. There are eventually more things to fix than there are humans to fix them, and IT resources are consumed with resolving recurring incidents. These problems manifest as missed deadlines, longer response and repair times, low customer satisfaction, and breaches of service-level agreements. To reduce the number of recurring incidents and their impact on organizations, organizations must address the causes of their incidents.

ITSM Best Practices: Problem Management

When problem management is merely a reaction to the most visible incidents, the damage is done before the problem management mechanisms are initiated. In this limited scope model, only the most critical incidents garner attention. These critical incidents, while significant, are uncommon, frequently unpredictable, and have a brief impact.

The Framework

The problem management framework can be used to implement a new problem management practice or enhance an existing one.

ITSM Best Practices: Problem Management

The framework walks you through the steps of putting in place an effective problem-solving strategy. This includes establishing the prerequisites for such a practice as well as defining the scope to ensure the focus is within your domain of control. Following that, the structure relies on best practices for problem identification, analysis, and resolution.

DxSherpa - D

The phases of the guidance framework are:

Pre-work: The pre-work phase focuses on the requirements for implementing a proactive problem-solving strategy.

Identification and creation: The identification phase is concerned with developing a repeatable procedure for identifying, selecting, and logging problems.

Mitigation: The mitigation phase focuses on establishing a standardized method for reducing incident impact.

Resolution: The resolution phase advises on the best approach based on the level of effort required to solve the problem.

Optimization: The optimization phase focuses on establishing a process for measuring the impact of your efforts and assessing known errors on an ongoing basis.

1.1 Pre-work | Prerequisites

Establishing a shared vocabulary is the first step in the pre-work phase.

1.2 Pre-work | Scope

Your process may be enterprise-wide or focused on a specific area. Before launching your problem management practice, the boundary lines should be clear and approved by your leadership team, regardless of how broad or narrow the scope is.

1.3 Pre-work | Roles and Responsibilities

Define the roles and responsibilities of the people who will work within these boundaries once the scope has been defined. Explain what problem management entails and what team members will be held accountable for.

1.4 Pre-work | Success Metrics

Align your success metrics with the outcomes your organization desires, but limit the scope of these metrics to what you can directly influence.

2.1 Identification & Creation | Problem-area identification

The recognition of problems is the first step in the problem-solving process. Most organizations have a process in place for creating a problem ticket and RCA in response to a Severity 1 incident, but there is no process in place for identifying problems associated with recurring incidents. This phase will concentrate on identifying issues that result in a high amount of recurring incidents.

2.2 Identification & Creation | Problem identification

Then, for recurring incidents, set a problem detection threshold. Set the threshold high enough to avoid defining more issues than the problem management process can handle. There is no point in building up a backlog of unresolved problem tickets. Lower the threshold to identify additional problems as the problems that generate far more incidents are resolved. Lower the threshold only so far that the number of incidents no longer justifies the level of effort required to run a problem through the incident handling process.

2.3 Identification & Creation | Select Problems

Create a process for selecting which problems to work on that is repeatable. Keep the process simple at first by focusing on isolated problems — specifically, those that are generating the most incidents — before moving on to systemic problems that are generating the most incidents.

2.4 Identification & Creation | Log Problems

Use problem tickets to collect the information needed to start the problem management process in a consistent manner. When creating a problem ticket, gather the bare minimum of information required to start the problem management process.

3.1 Mitigation | Known Error Database

Create a KEDB to record known errors, workarounds, symptoms, applicable CIs, and the associated problem. The KEDB provides immediate relief to impacted users by speeding up the agent’s ability to identify workarounds that reduce the impact of incidents associated with known errors.

3.2 Mitigation | Problem Workarounds

Workarounds are solutions that aim to lessen the impact of incidents. Workarounds are developed internally by problem owners and subject matter experts as part of the problem management process, or externally by resellers.

4.1 Resolution | Assess Known errors

Consider known errors to be a backlog that the problem manager must constantly groom in order to ensure that specialists are working on the most critical problems for the organisation. Take the highest-priority known error from your problem backlog. Examine how well the workaround is mitigating the problem’s impact.

4.2 Resolution | Potential cause identification

The recurrence of incidents indicates the existence of a problem, but the actual causes of the problem are difficult to pinpoint. Identifying and defining and its actual causes is the step in the problem management process where the team should spend the most time.

4.3 Resolution | ROI Determination

Once the causal issues have been identified and the extent to which these causes contribute to the problem has been determined, problem solvers must shift their focus to solutioning.

4.4 Resolution| Implement Fixes

An agreed-upon plan of action is required for problems that are amenable to resolution. This plan of action should include a list of the parties expected to carry out the steps to fix the problem, as well as a sequence of steps to fix the problem. Use the existing change practice to implement the fixes and establish a repeatable method for doing so. Where possible, time the execution of these fixes to coincide with the maintenance windows that have already been established. This will help to reduce the organization’s impact.

5. Optimization

The final phase of the incident handling process is optimization, which focuses on assessing the impact of problem management practices and identifying opportunities to improve these outcomes.


Author : Animish Raje