Introduction to the Op Language

Updated 4 months ago by Shoreline

What is Op?

As operators, you perform many manual tasks to maintain, monitor, and troubleshoot applications in the production environment.  If you are lucky, these tasks are part of scheduled maintenance operations:  a predetermined time block is published, users are notified, systems are quiesced, you go in and do your thing and get out, bring the system back up and notify everyone.

However, more often than not you have to do these manual tasks on-demand to handle a page during an on-call rotation.  Your organization may pursue the goal of NoOps (the idea that a software environment can be completely automated so that there's no need for an operations team to manage it) but we all know that fixing root causes in software takes weeks to months.  And during that time, we keep incurring and executing manual tasks to mitigate the issue.

At the same time, the number of supported environments, Clouds, and software deployments continues to increase, only further putting pressure on your ability to meet SLAs.

Shoreline Op is a purpose-built operations oriented language designed to allow operators and admins to rapidly:

  • understand, debug, and fix systems during an operational event
  • automate the tasks performed during mitigation in order to reduce or even end future manual processing

How Op Works

While you (the operator) are debugging, the Op interactive CLI allows you to gather and correlate information about resources, metrics, and system state to give you greater insight and inform the actions you need to take to mitigate the issue.  Afterwards, Op allows the operator to encode behaviors into Actions, Alarms, and Bots to automatically search for and fix the issues, without operator manual intervention going forward.

Using the Op language for “in the moment” debugging and permanent fixing significantly reduces the time to automate.  The statements used during operational work to identify and remediate an issue can be immediately deployed for permanent issue detection and fix.

The Op Language

Op uses familiar Bash syntax which makes it easy to learn. Its core primitives allow the operator to focus on encoding how to detect and mitigate an event rather than solving distributed systems problems.  Operators tell Op what to look for and how to fix it and then Op handles the undifferentiated heavy lifting such as propagation, error handling, auditing, system lease,...etc.

In the Op Language Guide articles, we will go through how to use Op for both interactive debugging and permanent automation.  We've also published a Glossary that has full documentation for each of the Op commands.

How did we do?