Actions: Fixing the Problem
Thus far, we have learned how Op discovers resources and relationships, how to measure those resources are operating, and how to set up alarm when an event is triggered. Monitoring and alarming are crucial activities, but they only tell us what or when something is going wrong. The problems don't fix themselves. However, as operators, we are responsible for mitigating and remediating these problems.
Instead of manually performing actions, Op lets us automate actions and encode an entire workflow. Let's create an action that displays the top 5 most cpu intensive processes and output the PID, command, cpu and memory usage.
The base Op command to configure this action includes the following parameters:
Action Name - must be alphanumeric, use underscores and/or dashes, and globally unique
Command - this is the shell command.
op> action high_cpu_action = `ps aux`
resource_query - this must be a valid Op statement that takes action on Op resources
op> high_cpu_action.resource_query = host | .pod | .container | app="shoreline” | limit=5
To verify that the action is defined we use the List command.
By default, all defined objects in Op start in a disabled state. We want to make sure that we only perform authorized and fully prepared actions. Op is all about safety and reliability. To enable our action, use the *enable* command.
- enable - the default is false
op> enable high_cpu_action
Our new action is also visible in the Shoreline UI. But it needs more information in order to be synchronized with and editable in the UI. This step is optional but highly so that operators can leverage both the CLI and UI seamlessly.
The Op commands to synchronize this action to the UI are:
op> high_cpu_action.start_title_template = “high cpu remediation has started"
op> high_cpu_action.start_short_template = “high cpu remediation has started"
op> high_cpu_action.error_title_template = “high cpu remediation resulted in error"
op> high_cpu_action.error_short_template = “high cpu remediation resulted in error"
op> high_cpu_action.complete_title_template = “high cpu remediation has completed"
op> high_cpu_action.complete_short_template = “high cpu remediation has completed"
The action is now completely configured to be managed from both the CLI and the UI.
To see all the configured actions, use the List command.
op> list action