Actions: Fixing the Problem

Updated 1 month ago by Shoreline

Thus far, we have learned how Op discovers resources and relationships, how to measure those resources are operating, and how to set up alarm when an event is triggered.  Monitoring and alarming are crucial activities, but they only tell us what or when something is going wrong.  The problems don't fix themselves.  However, as operators, we are responsible for mitigating and remediating these problems.

Instead of manually performing actions, Op lets us automate actions and encode an entire workflow. Let's create an action that displays the top 5 most cpu intensive processes and output the PID, command, cpu and memory usage.

The base Op command to configure this action includes the following parameters:

  • Action Name - must be alphanumeric, use underscores and/or dashes, and globally unique

  • Command - this is the shell command.
    op> action high_cpu_action = `ps aux`

  • resource_query - this must be a valid Op statement that takes action on Op resources
    op> high_cpu_action.resource_query = host | .pod | .container | app="shoreline” | limit=5

To verify that the action is defined we use the List command.

By default, all defined objects in Op start in a disabled state.  We want to make sure that we only perform authorized and fully prepared actions.  Op is all about safety and reliability.  To enable our action, use the *enable* command.
  • enable - the default is false
    op> enable high_cpu_action

Our new action is also visible in the Shoreline UI. But it needs more information in order to be synchronized with and editable in the UI. This step is optional but highly so that operators can leverage both the CLI and UI seamlessly.

The Op commands to synchronize this action to the UI are:

  • start_title_template
    op> high_cpu_action.start_title_template = “high cpu remediation has started"

  • start_short_template
    op> high_cpu_action.start_short_template = “high cpu remediation has started"

  • error_title_template
    op> high_cpu_action.error_title_template = “high cpu remediation resulted in error"

  • error_short_template
    op> high_cpu_action.error_short_template = “high cpu remediation resulted in error"

  • complete_title_template
    op> high_cpu_action.complete_title_template = “high cpu remediation has completed"

  • complete_short_template
    op> high_cpu_action.complete_short_template = “high cpu remediation has completed"

The action is now completely configured to be managed from both the CLI and the UI.

To see all the configured actions, use the List command.
op> list action

How did we do?