Use Case EDA Prometheus

Overview

This use case allows you to demonstrate Event-driven-Ansible (EDA) with Prometheus and its companion Alertmanager

Prerequisites

  • provide AWS credentials (currently only works on AWS)

Caveats

  • be aware that Prometheus can take some time to detect an event

    • A detected issue first goes into pending and then into firing state
    • the settings in prometheus are tuned for this use-case to make it as fast a possible
  • When an event goes into firing state it is sent to Alertmanager.

    • Alertmanager will forward it immediatly to the configured event stream in the running rulebook activation
  • EDA sometimes ignores events - it often makes sense to do a couple of use cases, of if time permits retry

  • Running the use-case playbook again sometime gives errors, mostly access errors. The root cause is unknown, but rerunning it usually fixes it.

How to use

There is currently one demo available:

Demo 1

Can be deployed using the workflow “Deploy EDA Prometheus Demo 1”

  • The demo will deploy:
    • one instance “prometheus” where it will install both prometheus server and alertmanager
    • one instance “node1” with RHEL that is configured as the node that is monitored for events by promatheus
    • prometheus os preconfigured to monitor for the SELinux state
    • EDA is configured to run a Job Template “EDA - Harden RHEL server” to fix the SELinux setting

For web only demo’s cockpit is preinstalled on each monitored system, but disabled. To enable:

  • systemctl enable cockpit
  • give the ec2-user a password
  • cockpit can be accessed on port 9090 on each monitored system (in demo 1 this is node1)

The public dns entries of each node can be found either in aws web console or AAP ec2 inventory data.

You can optionally add the systems to your ansible-de dns subzone for friendly dns names

URLS’s:

  • Prometheus web console is on port 9090 of the instance
  • Alertmanager web console is on port 9093 of the instance
  • Cockpit (if enabled) is on port 9090 of the monitored instance(s)

It is entirely possible to modify the demo at will, or define other demo workflows.

  • Each function can be deployed on a instance of choice
  • Extra monitored systems are possible
  • Study the demo 1 worklow building blocks for involved job templates and their extra vars
  • Extra rules and associated rulebooks and fix playbooks can only be added through a git issue or merge request

Rerunning the demo 1 workflow again usually works. If not, remove the workflow first.