SCOM2007 : Performance Degradation / Maintenance Mode Scheduling Tool

Hi Everyone,

Last week, I’ve been called by one of my customer, who told me that they saw a huge performance degradation of their SCOM2007 R2 environment.

This customer has a quite big infrastructure, around 2200 servers, using Nworks MP, NetApp MP, Oracle Enterprise Manager connector, Opalis 6.3, Scheduling maintenance mode ….. This environment was working well since a long time and no major modification has been done so far.

The symptoms :

  • Taking more 10 min for connecting to the console.
  • Console navigation extremely slow or not responding.
  • High CPU usage on the RMS.
  • Very high CPU usage on the DB Cluster.
  • Connection lost time to time to the DBs

After we checked everything we could, it was time to call the Microsoft Premier Support. After several days of investigation with them, we didn’t identity clearly the root cause. so, It was time to turn off one by one each connectors, each third party software that were connecting to Operations Manager.

It took time but we finally found which application was causing our nightmare since several days…. it was the Maintenance Mode Scheduling Tool.

This tool has been released part of the Administration Resource Kit for System Center Operations Manager 2007 R2 published by Operations Manager Product Team some months ago. More information available here : http://blogs.technet.com/b/jimmyharper/archive/2011/07/14/maintenance-mode-scheduling-tool.aspx

This is really an excellent tool that provide you an user interface for scheduling maintenance mode of your servers. My customer was using it a lot for disabling the monitoring during batch processing, planned reboot and so on.

Since we turned it off by disabling the rule Maintenance Mode Workflow, the Operations Manager infrastructure went back to business. So I continued to investigate on why this tools was causing all our troubles and I finally found this post : http://blogs.technet.com/b/momteam/archive/2011/06/21/schedule-maintenance-mode-reskit-tool-info.aspx

Best Practices:

  • Do not go beyond 20 Jobs scheduled.  Anything over this will start to place too much load on your Root Management Server
  • Do not schedule more then 20 items in one Job.  If you need to go over this please create a group and target this.  Note: The MP has a bug that only limits you to select up to six objects.  I have attached an updated MP that corrects this issue below.
  • When scheduling a group make sure to select system.group then select the group.  If you select the group itself the tool lists the individual group membership.

My customer had more than 40 Jobs scheduled…. so we were clearly not respecting the best practice and that was explaining why our performance degradation came slowly day per day.

Currently, the only solution that we have for scheduling maintenance mode without using that’s tool is to come back to a PowerShell script that we will run from a windows schedule task.

Resources :
System Center Operations Manager 2007 R2 Evaluation : http://technet.microsoft.com/en-us/systemcenter/om/bb498233

Technet Forums : http://social.technet.microsoft.com/Forums/en-US/category/systemcenteroperationsmanager

Cheers
Christopher KEYAERT
http://twitter.com/keyaertc

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Email this to someoneShare on TumblrPin on PinterestDigg thisShare on RedditFlattr the authorBuffer this pageShare on StumbleUpon

About Christopher Keyaert

Christopher Keyaert is a Consultant, focused on helping partners to leverage the System Center and Microsoft Azure cloud platform. He is also a Microsoft Most Valuable Professional (MVP) for Cloud and Data Center Management and a Microsoft Certified Trainer (MCT).
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to SCOM2007 : Performance Degradation / Maintenance Mode Scheduling Tool

  1. Sameer says:

    Excellent post Chris. We are going through the exact same situation and have been with the msft case since 2 months, but no solution.

    Can you please give more details on how you disabled the connectors one at a time? Though we don’t use the maintenance mode scheduler much, and only 5 jobs so far, i would still like to start by disabling that connector.
    More details on how you did that would really help.

    Thanks
    Sameer

  2. Sameer says:

    Thanks Chris so much for this post. We have been struggling with Microsoft tier 3 engineer since last 4 months, but found no solution.
    disabling the rule fixed the issue!! Thanks a lot once again..

Leave a Reply

Your email address will not be published. Required fields are marked *