Session

Each session of Deployments and Disasters is about resolving a single incident. A game session has 3 stages:

  1. Briefing stage, where DM explains the scenario they’ve prepared and players pick their roles.
  2. Incident stage, where DM introduces the incident and players take actions to resolve it.
  3. Review stage, where players discuss future proofing the system.

Disaster master defines the software system that will suffer the incident and prepares player roles. It’s generally hard to squeeze several incidents into the same system and keep it compact enough for players to quickly grasp. That’s why each Deployments and Disasters session takes place in a standalone, separate scenario.

RPG concept

In tabletop RPG terms this makes Deployments and Disasters session a one-off.

Briefing stage

The main objective of the briefing stage is for all players to familiarize themselves with some basic information. That includes:

  1. Rules of the Deployments and Disasters game itself.
  2. Technical and business aspects of the scenario that DM prepared.
  3. In-depth info on the software system that their role provides.

If players are from different teams or departments (or different companies all together) this is the time for everyone to get to know each other.

Incident stage

Incident stage begins with DM introducing the actual incident in the software system described in the briefing stage. After that incident stage is divided into individual turns. On each turn players can perform any number of actions in an attempt to understand what caused the incident and to resolve it.

The goal of this stage is to resolve the incident in as few turns as possible. This is tracked by the incident clock that starts at 00:00 and progresses by 5 minutes each turn. Depending on the scenario DM marks certain key timestamps on the clock, such as duration after which penalties are paid to clients, or duration that denotes the breach of contract. If the incident is not resolved by the time a breach of contract is reached players lose the game.

Players can gain some extra time (effectively push timestamps back) by publishing a detailed client impact by the end of the 3rd turn. The report should include:

  • A list of impacted clients.
  • A list of impacted features.
  • The nature of impact (increased latency or error rates, decreased throughput, etc.).

Review stage

This stage is modelled after your company’s post incident review. This is where players get to discuss:

  1. All the factors that contributed to the incident.
  2. What actions were performed in order to resolve the incident.
  3. What can be done to prevent the incident from repeating.
  4. What can be done to improve the incident management process itself.

If your company uses some standardized reporting format for this type of meeting players can fill out a mock version of it.

In addition to post incident review in this stage disaster master can collect some feedback on the game itself. This can be useful for scaling the game.