Detecting Gray Agents in SCOM 2012

I already wrote a little bit about this issue in an earlier post, here.  Unfortunately I had to revisit this topic lately, and had to clarify on my earlier post as well as add some additional checks.

First off, there is a number of ways to perform this detection, using PowerShell, direct SQL queries, as well as using some custom management packs. Check this blog for some ideas on this, I found it useful.

In my specific case, I am working with SCOM SDK / API, and have a code base that currently makes no direct calls to SQL. I love bypassing SCOM SDK for faster, easier access to SCOM data directly in SQL tables, but what if you were forced to stay within the SDK?

HealthState vs. IsAvailable

Once the agent is installed, it can have 4 different health states:

0 = Microsoft.EnterpriseManagement.Configuration.HealthState.Uninitialized
1 = Microsoft.EnterpriseManagement.Configuration.HealthState.Success
2 = Microsoft.EnterpriseManagement.Configuration.HealthState.Warning
3 = Microsoft.EnterpriseManagement.Configuration.HealthState.Error

These states are represented using different icons in the SCOM UI, but just to be clear, none of these states indicate that the agent is gray (grey). These health states suggest that communication with the health service on the agent side is up (meaning the agent is NOT gray). To get at the HealthState in SCOM SDK:

Dim agents As IList(Of AgentManagedComputer)

Try
	Using mg = New ManagementGroup("localhost")
		agents = mg.Administration.GetAllAgentManagedComputers
		For Each agent As AgentManagedComputer In agents
			If agent.HealthState = HealthState.Error Then
			'... agent is red; do something here
			End if
		Next
	End Using

Catch ex As Exception
	'... something did not go well; deal with it here
End Try

This is very straight-forward, but does not give us gray agents.

IsAvailable property of the HostedHealthService where agent is running, on the other hand, indicates if the communication with the agent is up or down. Communication loss with the health service (while HostComputer is up) will force the agent into gray mode. Here’s how to get at this using the API:

Dim agents As IList(Of AgentManagedComputer)

Try
	Using mg = New ManagementGroup("localhost")
		agents = mg.Administration.GetAllAgentManagedComputers
		For Each agent As AgentManagedComputer In agents
			If Not agent.HostedHealthService.IsAvailable Then
			'... agent is grey; do something here
			End if
		Next
	End Using

Catch ex As Exception
	'... something did not go well; deal with it here
End Try

Stopping System Center Management service on the monitored server will flip the agent into grey state and this code loop should be able to pick it up.

There is one more .IsAvailable property, here: agent.HostComputer.IsAvailable. I am in the process of testing the difference between IsAvailable property of agent.HostComputer and agent.HostedHealthService. The working theory is that the former reflects the state of the managed computer and the latter – the state of the SCOM agent service. For now, my code monitors both IsAvailable properties and flags it if one of them goes to False.

Agents vs. ManagementServers

Gray agents is a problem, but a much smaller problem compared to gray management servers. When SCOM management server agent goes gray, chances are there is a critical issue (such as an SDK service not fully communicating with the database components, or similar). This issue is sometimes unnoticeable and even SCOM UI would load and work for simple navigation tasks, however the SCOM system is really not healthy and monitoring is compromised. How do you pull this up in the SDK? Use GetAllManagementServers method in ManagementGroup.Administration:

Dim agents As IList(Of ManagementServer)

Try
	Using mg = New ManagementGroup("localhost")
		agents = mg.Administration.GetAllManagementServers
		For Each agent As ManagementServer In agents
			If Not agent.HostedHealthService.IsAvailable Then
			'... management server is grey; do something here
			End if
		Next
	End Using

Catch ex As Exception
	'... something did not go well; deal with it here
End Try

And finally…

SDK vs. SQL

If you are adamant to use direct SQL queries,

SELECT * FROM ManagedEntityGenericView
INNER JOIN ManagedTypeView 
ON ManagedEntityGenericView.MonitoringClassId = ManagedTypeView.Id
WHERE (ManagedTypeView.Name = 'microsoft.systemCenter.managementserver')

Use the one above to get the status of management server agents, and the one below for all the rest (monitored server agents):

SELECT * FROM ManagedEntityGenericView
INNER JOIN ManagedTypeView 
ON ManagedEntityGenericView.MonitoringClassId = ManagedTypeView.Id
WHERE (ManagedTypeView.Name = 'microsoft.systemCenter.agent')

The fields you are looking for are IsAvailable (0 = grey agent) and HealthState (3 = critical health state).

Leave a Reply

Your email address will not be published. Required fields are marked *