Last week, I’ve been called by one of my customer, who told me that they saw a huge performance degradation of their SCOM2007 R2 environment.
This customer has a quite big infrastructure, around 2200 servers, using Nworks MP, NetApp MP, Oracle Enterprise Manager connector, Opalis 6.3, Scheduling maintenance mode ….. This environment was working well since a long time and no major modification has been done so far.
The symptoms :
- Taking more 10 min for connecting to the console.
- Console navigation extremely slow or not responding.
- High CPU usage on the RMS.
- Very high CPU usage on the DB Cluster.
- Connection lost time to time to the DBs
After we checked everything we could, it was time to call the Microsoft Premier Support. After several days of investigation with them, we didn’t identity clearly the root cause. so, It was time to turn off one by one each connectors, each third party software that were connecting to Operations Manager.
It took time but we finally found which application was causing our nightmare since several days…. it was the Maintenance Mode Scheduling Tool.
This tool has been released part of the Administration Resource Kit for System Center Operations Manager 2007 R2 published by Operations Manager Product Team some months ago. More information available here : http://blogs.technet.com/b/jimmyharper/archive/2011/07/14/maintenance-mode-scheduling-tool.aspx
This is really an excellent tool that provide you an user interface for scheduling maintenance mode of your servers. My customer was using it a lot for disabling the monitoring during batch processing, planned reboot and so on.
Since we turned it off by disabling the rule Maintenance Mode Workflow, the Operations Manager infrastructure went back to business. So I continued to investigate on why this tools was causing all our troubles and I finally found this post : http://blogs.technet.com/b/momteam/archive/2011/06/21/schedule-maintenance-mode-reskit-tool-info.aspx
- Do not go beyond 20 Jobs scheduled. Anything over this will start to place too much load on your Root Management Server
- Do not schedule more then 20 items in one Job. If you need to go over this please create a group and target this. Note: The MP has a bug that only limits you to select up to six objects. I have attached an updated MP that corrects this issue below.
- When scheduling a group make sure to select system.group then select the group. If you select the group itself the tool lists the individual group membership.
My customer had more than 40 Jobs scheduled…. so we were clearly not respecting the best practice and that was explaining why our performance degradation came slowly day per day.
Currently, the only solution that we have for scheduling maintenance mode without using that’s tool is to come back to a PowerShell script that we will run from a windows schedule task.
System Center Operations Manager 2007 R2 Evaluation : http://technet.microsoft.com/en-us/systemcenter/om/bb498233