FAQ, GUIDE, SCOM, System Center

SCOM – Alert basics

Ever wondered how the alerting works in SCOM? In this post we will go through the basics of alerts in SCOM, how alerts are generated, and what options alerts have.

1. Introduction

In this post we’ll go through some basics about alerts in SCOM. An alert are for many just a simple “alert”, however an alert can actually tell us much more than the actual problem itself, it can provide us information of something bigger going on. Alert happens for a reason, most of the times it is due to a failure in a software or in the hardware, but it may also be a configuration error that we may have missed when setting up something.

We’ll also go through how the alerts are generated and how the alert process works.

2. Alerts Alert_Bell1

Alerts help us to identify ongoing or upcoming issues that we may get within our data centre, it may be either software or hardware related issues. It is important to fully understand the meaning of the alert(s) that we receive before taking any action.

Alert Classification

All alerts in SCOM have a classification, this means that each alert has its own priority, it will also have its own severity depending on how serious the issue is.

Alert Severity

There are a total of three (3) different alert in SCOM, much like in the Windows event log:

Each severity has its own value, these numeric values are more commonly seen when creating overrides for either monitors or rules.

Alert Priority

Similar to the alert severities, there are also a total of three (3) different alert priorities in SCOM:

Alert priority is important, because it helps us determine how an alert should be processed, all management packs have a configured an alert priority for the alerts that may rise, but these priorities may not always be relevant according to our own environment and needs, therefore it’s important that each organisation checks the priorities and configures them based on their environment and needs.

Alert Resolution States

When an alert gets generated the resolution state will always start with the resolution state of New (ID: 0). Resolution states helps the IT support or whoever is managing the SCOM alerts to identify the status of an alert.

By default there are a total of seven (7) different resolution states in SCOM:

These can also be found in the Operations Manager console under the Administration pane, under Security > Settings > Alerts.

ResolutionStates

It is also possible to set custom resolution states, this can be done with solutions like System Center Orchestrator or PowerShell.
Each resolution state must have a unique ID, the IDs can range from 1 – 255, the default IDs cannot be used as they are already in use, the resolution state “descriptor” (Ex. “Assigned to Engineering”) can be anything.

3. Alert Generation

In SCOM there are two (2) possible ways of generating alerts:

  1. Alert generated by a Monitor UnitMonitor
  2. Alert generated by a Rule Rule

SCOM uses conditions for generating alerts, when a specific condition, or multiple conditions are met, only then will an alert be raised.

The condition could be for example:

  • Whenever a specific event in the Windows event log occurs, we raise an alert.
  • Whenever a service or operation is failing, we raise an alert.

Monitor UnitMonitor

A monitor measures the health of some aspect of a managed object, so what does this really mean?

A monitor can change its state between three (3) different states:

  1. Healthy (green) SCOM_healthy
  2. Warning (yellow) SCOM_Warning
  3. Critical (red) critical

When a monitor changes its state, let’s say from Healthy to Warning/Critical, the monitor will generate an alert.

To understand this better, let’s have a look at the graph below:

SCOM_monitor_alertmatrix

The graphs above describes which state changes can generate alerts from a monitor, an alert will only be generated if a monitor changes its state from Healthy to either Warning or Critical.

There are three (3) types of monitors available:

CreateMonitor

1. Unit Monitors

The unit monitor is the most common monitor, when we refer to a “monitor” we usually refer to a “unit monitor”. A unit monitor will measure some aspects of an application, it could be running a script to check something, or look for a specific event in the event log that indicates an error.

In the image below we’ll see an example of two unit monitors:

unit_monitor3

2. Dependency Rollup Monitors

A dependency monitor provides a health rollup between different classes, it will allow the health of a specific object to depend on the health of another object.

Similar to an aggregate rollup monitor, a dependency rollup monitor can be
used to group other monitors to set the health state and generate alerts.

In the image below we’ll see an example of how a dependency monitor works:

dependency_monitor

3. Aggregate Rollup Monitors

An aggregate rollup monitor provides a combined health state for similar monitors,
it reflects the state of unit, dependency rollup, or other aggregate rollup monitors targeted to an object.

You would typically use an aggregate rollup monitor to group multiple monitors into one monitor and then use that monitor to set the health state and generate an alert.

In the image below we’ll see an example of how an aggregate monitor works:

aggregate_monitor
The image below should give us a clear view of how all the monitors look like:

unit_monitor


Example

We have a monitor Percentage of Committed Memory in Use for monitoring the memory usage on a critical production Windows Server:

available_memory_%_scom

Note: The “Percentage of Committed Memory in Use” is disabled by default, an override has to be created to enable it.

The monitor has a threshold of 90% configured, this means that whenever the memory usage reaches 90% or above, an alert will be raised:

available_memory_%_scom2

  • If the memory usage is below 90% the monitor state will stay as HealthySCOM_healthy
  • If the memory usage is 90% or above the monitor state will change to Criticalcritical
    and an alert will be generated (the monitor state is is configurable in the override)

What happens if a monitor reaches its threshold multiple times?

The alert monitor will not generate alerts as long as the monitor is in an critical state.
However if the monitor has returned to a healthy state, and the memory threshold is reached again, and the monitor state changes to Critical, an alert will be generated.

If the monitor is configured to send an alert for warning or critical, and the monitor has already sent an alert when the state changed to warning, the monitor will only send a second alert when the stage changes from warning to critical if the first alert has been closed.

If the alert that was sent when the state changed to warning remains open, no alert will be sent when the state changes from warning to critical.

Rule Rule

Rules do not affect the health state of a target object, this means that rules can send as many alerts as it wants as long as the condition that caused the alert remains true.

There are three (3) types of rules available:

1. Alert Rules

An alert rule is used when we want to be able to generate an alert when an event is detected, the list below shows the available alert rule templates:

Alert_Rule

2. Collection Rules

Collection rules are used whenever we want to collect either performance data or events, this data will then be stored in the Operations Manager database and data warehouse.

If we want to collect data, the data will be automatically converted to either event or performance data, depending on the collection rule that we create.

Below is the list of available collection rule templates:

Collection_Rule

3. Command Rules

A command rule will basically run a script or a command on a schedule, below are the available templates:

Timed_Commands

Alert Suppression

To prevent a rule from “spamming” alerts we have the option to create an alert suppression. When an alert suppression is enabled for a rule it will increment a repeat counter and only send the first alert, while the rest of the alerts will be suppressed.

Note: Only duplicate alerts will be suppressed.

How can we check if a rule has generated many alerts?

In the Monitoring pane within the Operations Console, head to an alert view, right-click the upper toolbar and select Personalize view.

PersonalizeView

Next scroll down until we see Repeat Count, place a check mark in the check box and click OK.

Repeat_Count1

We should now see the Repeat Count column in our alert view:

Repeat_Count2


4. Monitor or Rule?

This is a quite common question that arises, when should we use a monitor instead of a rule? and vice versa?

Let’s sum up some facts about monitors and rules:

  • Both run workflows on a SCOM agent computer.
  • Both can generate alerts.
  • Both use similar set of data sources to detect conditions.

When to use a Monitor UnitMonitor

  • When want to receive a health state of an object, in other words: we want to be able to see “traffic lights” of an object.
  • When we want an alert to be automatically resolved, once the error condition has been cleared, the monitor is able to automatically detect if the problem has been resolved or not.
  • When we want to use performance thresholds, because there are no rules available to generate an alert from a performance threshold.
  • When we have a requirement that requires more complex logic, and isn’t possible to achieve with rules.

Example

A Windows Server is monitored by SCOM, every evening there are scheduled workloads being run, these workloads cause very high CPU usage. During the running workloads an alert is generated for the total CPU utilization percentage, this also means the monitor will change its state from healthy to critical, this is OK as we are aware of the scheduled workloads.

Once the workload jobs have completed, the CPU utilization become idle, which means that the alert for the CPU utilization that was raised earlier is no longer true. In this case we want the alert to be automatically resolved and the health state of the monitor to become green (healthy) again, this will be achieved automatically when using a monitor, because the monitor will detect that the problem has been resolved and therefore change the health state back to healthy.


Why do health states matter? It’s the colors that make a difference, or the “traffic lights” as some people call them, because we humans react fast on colors, especially the red color which according to studies encourages action.

Take for example a normal computer’s hard drives in the image below:

OS_disks_health

One of the two hard drives is starting to fill up on disk space, the  red color kind of “shouts” out to our eyes that something is wrong, it gives us a good indication that the disk requires interaction.

When to use a Rule Rule

  • When we want to collect performance counters or events for analysis and reporting.
  • When we want to simply create an alert without any health states.
  • When we want an alert to not be auto-closed.

How do we identify if an alert came from a Monitor or a Rule?

Every alert that we receive will tell us if the alert came from a rule or a monitor in the Alert Details.

Example

If an alert came from a rule, we will see something similar to this:

AlertRule

If an alert came from a monitor, we will see the something similar to the following:

AlertMon

We can also retrieve this information with PowerShell by running the Get-SCOMAlert cmdlet.

Example

Get-SCOMAlert -Name "Failed to Connect to Computer" | Select *

The command above will return all properties of a SCOM alert, the property that we are interested in is called “IsMonitorAlert” as shown in the image below:

IsMonitorAlert

IsMonitorAlert = True (Alert was generated by a monitor)

IsMonitorAlert = False (Alert was generated by a rule)

5. Alert Notifications

When SCOM generates an alert, the alert will be shown in an Alert View or in a Dashboard under the Monitoring pane in the Operations Console, SCOM can also notify individuals by sending the alert as a notification.

SCOM notifications consists of three (3) different components, all of these components are required to successfully send notifications:

  1. Channels
  2. Subscribers
  3. Subscriptions

Channels

SCOM uses “channels” this is the so called engine that will decide what type of notification we want to send, there are four (4) different ways of sending notifications in SCOM:

1. Email (SMTP) Notification Channel EmailChannel

The email notification is the most commonly used way of sending notifications, SCOM supports sending emails by using both internal and external email authentications.

Since SCOM 2019 there has been a lot of enhancements in the email notifications, the subscription criteria has also been enhanced, SCOM 2019 now also supports email notifications in HTML format. Read more about it HERE.

Here’s a comparison between SCOM 2016 and SCOM 2019:

SCOM 2016
SCOM2016_SMTP

SCOM 2019
SCOM2019_SMTP

2. Instant Message (IM) Notification Channel IMChannel

Notifications can also be send as instant messages, many organisations are using Skype for Business (previously known as Lync) as the main instant messaging communications program, with SCOM we can achieve to send instant messages directly to users on Skype for Business.

3. Text Message (SMS) Notification Channel SMSChannel

SCOM is also able to send text messages, more commonly known as “SMS” (Short Message Service). For this to work we’ll need either a SMS module or a modem where, which we’ll need to insert a SIM card to be able to send SMS to recipients.

Sending SCOM alerts as SMS notifications are also quite commonly used in SCOM, this is a great way of sending very critical alerts or alerts with high importance.

4. Command Notification Channel CommandChannel

SCOM is also able to make us of the commands, this can be used to run executable programs or scripts in response to an alert.

We can for example write to an event log, create a text file with the alert information or create a ticket to a ticketing system, or even use a scripts to send emails or SMS messages.

Subscribers

SCOM uses subscribers to define to whom and when we want to send notifications, a subscriber can be either a person, multiple persons or an email distribution list.

This is the only place where we are able to configure when SCOM should notify users of alerts.

Subscriber_Schedule

Subscriptions

A subscription is where we define:

  • The scope for the alert(s).
  • The channel to be used.
  • The subscriber(s) to be used.
  • The criteria of what we want SCOM to notify.

Since SCOM 2019 there has been enhancements in the subscription criteria, it is now possible to use regular expressions to build complex criteria for our subscriptions.
Read more about this over HERE.

Subscription comparison between SCOM 2016 and SCOM 2019:

SCOM 2016
SCOM2016_Criteria

SCOM 2019
SCOM2019_Criteria

6. Conclusion

We’ve now briefly gone through the basics of SCOM alerts, some of it’s properties and how they are generated. We should now be able to have a better understanding of what generates alerts and how notifications really work.

 8,724 total views,  1 views today

7 thoughts on “SCOM – Alert basics”

  1. Thanks for this overview! Can you advise if it’s possible to present which subscription handled an alert and sent it out so it can be viewed in Alert Views? Preferably via inbuilt capability rather than scripting. This would be incredibly useful.

    1. Hi David,

      I’m afraid this isn’t possible to achieve with Alert Views, you’ll need to manually check which subscription is configured to send out the specific alerts that you want to know about. An easy way to know which subscription the alert came from is to include the subscription name to the alert e-mail.

      Best regards,
      Leon

  2. Hi Leon,

    you think it is possible to change the banner color in scom 2019 for the HTML mail warnings….default is red but i would like to find how to change this to green when the status of the alert is set to resolved?

    Best regards
    Dave

    1. Hi Dave,

      I’m honestly not sure if this is possible, I’m no HTML guru so you would have to check with someone who knows HTML well.
      There are many more possibilities for modifications in SCOM 2019 regarding the email notifications, you have “Severity” property so you might be able to use that, maybe with some HTML conditional statements.

      https://docs.microsoft.com/en-us/system-center/scom/manage-notificiations-customize-message?view=sc-om-2019

      Best regards,
      Leon

    2. Hi Dave,
      It’s possible. Look into your html template. You should have an internal css there. In my case, I modified “.header-0, .header-1, header-2” into “.header-Resolved etc.” Please see below snippet. Just modify the “.background-color” attribute to the color of your liking.

      .header-Resolved{background-color: green;color: white;}
      .header-Acknowledged{background-color: lightskyblue;color: black;}
      .header-New{background-color: red;color: white;}
      .scom{display:none; max-height: 0px; overflow: hidden;}
      .severity-$Data[Default=’Not Present’]/Context/DataItem/Severity${display: inline;text-decoration:none;}
      .monitor-$Data[Default=’Not Present’]/Context/DataItem/CreatedByMonitor${display: inline;text-decoration:none;}
      .cell{float: left;overflow: hidden;text-overflow: ellipsis;white-space: nowrap; max-width: 100%;}

      Additionally, I changed the table class which is responsible for the header. I changed it from “header-$Data[Default=’Not Present’]/Context/DataItem/Severity$” to “header-$Data[Default=’Not Present’]/Context/DataItem/ResolutionStateName$”

      Regards,
      Orchyl

  3. How does the Percentage of Committed Memory in Use alert work with the page file? I’ve set the alert to be 90% and run a tool to increase memory to over 90%, and no alert generates. The total committed is at 50% (pagefile added another 8gb). When i drop the alert to 20% i get a notification.

    it seems to be monitoring the ‘Committed’ value, not the ‘In use’ value which is physical memory. Any luck on getting alerts generated for this?

Leave a Reply to The System Center Blog Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.