Introduction to the Shoreline UI

Updated 2 months ago by Shoreline

Automated Remediation

In the world of DevOps, operators respond to events and updates as they occur with the goal of making sure that systems are highly available and performant. You perform many manual tasks to monitor, and troubleshoot applications in the production environment. The number of supported environments, Clouds, and software deployments continues to increase, putting pressure on your ability to meet SLAs and maintain high availability.

Meeting your SLAs and maintaining high availability is a function of mean time to failure (MTTF) and mean time to repair (MTTR). MTTR for the majority of issues is at least 5x more than what is required to support SLAs. Detection, notification, and ticket routing takes time. But fixing the problem, not just observing and reporting on it, is what matters. And that’s where human intervention adds latency and further delays the resolution. Relying on manual intervention for change management and event handling is error-prone and cannot scale. You need automated monitoring and remediation to meet your organization’s SLAs.

UI Managed Objects

Resources - resources are the infrastructure objects in your environment. Resources can be hosts, pods, containers, virtual machines, database instances etc. On different platforms, you can have different resources e.g. pods and containers for Kubernetes and virtual machines for AWS, GCP, or Azure.

Metrics - metrics are time series data associated with your resources such as CPU utilization, latency, throughput, or error rate. You can query the metrics gathered to monitor the environment. You also use metrics to determine if a condition or threshold has been met which requires an Action to be taken.

Alarms - alarms tell you when something has gone wrong and/or needs attention. Alarms are defined on metrics, resources, and system state and are raised when a condition is met.

Actions - actions are shell commands and shell scripts that help you mitigate and remediate the problem that an Alarm has identified. You can define custom actions and encode an entire operational workflow into a runbook using Actions.

Bots - bots bind Alarms and Actions together using IF-THEN-ELSE constructs. Bots specify the action or set of actions to take when the alarm fires.

UI Single Pane of Glass

Shoreline is an end-to-end operations tool designed to automate the entire system monitoring and remediation lifecycle across:

  • Kubernetes and EC2 Instances
  • Zones and Regions
  • AWS, Azure, and GCP

From a single pane of glass you can set up and continuously manage your entire environment.

  • Set up:
        - Metrics and resources
        - Monitoring and remediation actions
        - Scripts, and bots
        - Integrations to notification and ticketing services
  • Real time, drill-down, and multi-dimensional monitoring dashboard by:
        - Resources
        - Metrics
        - Tme period
  • 1 second data sampling for alarm accuracy and to reduce false positives
  • Ability to analyze from a granular host level up to a cluster
  • Real time detailed notification of event by:
        - Alarm raised
        - Action execution
        - Alarm resolved
  • Alarms trigger in 10 seconds
  • Alarms can be set on combinations of metrics
  • Alarms can be configured to be muted and/or confirmed based on real-time information to avoid false positives
  • Notification and status of response to event
        - Action(s) taken
        - Bot executed
  • Goal is to resolve issues within time period of seconds
  • Actions and bots run locally and automatically

Accessing the Shoreline UI

Access to the Shoreline UI is through a URL specific to your organization. You enter your credentials through your or Shoreline's identity management service to log into the Shoreline server.

How did we do?