Windows Server 2003 keeps rebooting...

edited December 2010 in Hardware
Greetings!

We have a server here that keeps rebooting. It's a Proliant ML370 G5 running Windows Server 2003 with SP2. The stop codes vary; we've seen (all beginning with 0x000000) C2, 50, D1, 24, C5, 7E, 44, 0A, & 8E.

The reboots happen when trying to do any kind of backup *other* than a Full backup. We've tried backups using Backup Exec 12.5 & the backup utility that comes with the OS, and reboots occur with both, so we don't think its Backup Exec. The reboots occur at different points in the backups, so there isn't a specific file/folder that's causing issues. And Full backups run with no problem. Any help would be greatly appreciated!

Below is the Memory.dmp file for the most recent reboot:
____________________________________________________
Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p> </o:p>
<o:p> </o:p>
Loading Dump File [C:\Test\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
<o:p> </o:p>
Symbol search path is: C:\WINDOWS\Symbols;srv*
Executable search path is:
Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_gdr.100216-1301
Machine Name:
Kernel base = 0x80800000 PsLoadedModuleList = 0x808af9c8
Debug session time: Thu Oct 14 09:36:28.491 2010 (UTC - 4:00)
System Uptime: 0 days 2:27:15.625
Loading Kernel Symbols
...............................................................
............................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 7ffda00c). Type ".hh dbgerr001" for details
Loading unloaded module list
.......
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
<o:p> </o:p>
Use !analyze -v to get detailed debugging information.
<o:p> </o:p>
BugCheck 8E, {c0000005, 8089bcdc, b914fac8, 0}
<o:p> </o:p>
*** ERROR: Symbol file could not be found. Defaulted to export symbols for mfehidk.sys -
Probably caused by : Pool_Corruption ( nt!ExFreePool+f )
<o:p> </o:p>
Followup: Pool_corruption


<o:p> </o:p>
3: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
<o:p> </o:p>
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8089bcdc, The address that the exception occurred at
Arg3: b914fac8, Trap Frame
Arg4: 00000000
<o:p> </o:p>
Debugging Details:


<o:p> </o:p>
<o:p> </o:p>
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
<o:p> </o:p>
FAULTING_IP:
nt!ExAllocatePoolWithTag+838
8089bcdc 8b07 mov eax,dword ptr [edi]
<o:p> </o:p>
TRAP_FRAME: b914fac8 -- (.trap 0xffffffffb914fac8)
ErrCode = 00000000
eax=8a7bc170 ebx=8a7bb0c0 ecx=8a7bc170 edx=8a7bc170 esi=8a7bb210 edi=04c507b6
eip=8089bcdc esp=b914fb3c ebp=b914fb78 iopl=0 nv up ei pl nz na pe cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010207
nt!ExAllocatePoolWithTag+0x838:
8089bcdc 8b07 mov eax,dword ptr [edi] ds:0023:04c507b6=????????
Resetting default scope
<o:p> </o:p>
DEFAULT_BUCKET_ID: DRIVER_FAULT
<o:p> </o:p>
BUGCHECK_STR: 0x8E
<o:p> </o:p>
PROCESS_NAME: cqmgstor.exe
<o:p> </o:p>
CURRENT_IRQL: 0
<o:p> </o:p>
LAST_CONTROL_TRANSFER: from 8085bba7 to 8087c4a0
<o:p> </o:p>
STACK_TEXT:
b914f694 8085bba7 0000008e c0000005 8089bcdc nt!KeBugCheckEx+0x1b
b914fa58 808346cc b914fa74 00000000 b914fac8 nt!KiDispatchException+0x3a2
b914fac0 80834680 b914fb78 8089bcdc badb0d00 nt!CommonDispatchException+0x4a
b914fae4 8089c26e fe4c4cd0 00000000 00000000 nt!Kei386EoiHelper+0x186
b914fb78 f711eb2c 00000001 00000000 3945464d nt!ExFreePool+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
b914fb94 f711ecc7 88c35f08 b914fcf0 00000000 mfehidk+0xcb2c
b914fcdc f713162a b914fcf0 005cfe4c b914fd64 mfehidk+0xccc7
b914fd18 f713201d 85499dc8 b914fd44 b914fd4c mfehidk+0x1f62a
b914fd64 7c82860c badb0d00 005cfe34 00000000 mfehidk+0x2001d
b914fd68 badb0d00 005cfe34 00000000 00000000 0x7c82860c
b914fd6c 005cfe34 00000000 00000000 00000000 0xbadb0d00
b914fd70 00000000 00000000 00000000 00000000 0x5cfe34
<o:p> </o:p>
<o:p> </o:p>
STACK_COMMAND: kb
<o:p> </o:p>
FOLLOWUP_IP:
nt!ExFreePool+f
8089c26e 5d pop ebp
<o:p> </o:p>
SYMBOL_STACK_INDEX: 4
<o:p> </o:p>
SYMBOL_NAME: nt!ExFreePool+f
<o:p> </o:p>
FOLLOWUP_NAME: Pool_corruption
<o:p> </o:p>
IMAGE_NAME: Pool_Corruption
<o:p> </o:p>
DEBUG_FLR_IMAGE_TIMESTAMP: 0
<o:p> </o:p>
MODULE_NAME: Pool_Corruption
<o:p> </o:p>
FAILURE_BUCKET_ID: 0x8E_nt!ExFreePool+f
<o:p> </o:p>
BUCKET_ID: 0x8E_nt!ExFreePool+f
<o:p> </o:p>
Followup: Pool_corruption


<o:p> </o:p>
3: kd> lmvm Pool_Corruption
start end module name
_________________________________________________________________

Thanks again for the help!

mikepbmike

Comments

  • shwaipshwaip bluffin' with my muffin Icrontian
    edited October 2010
    memtest, chkdsk?
  • edited October 2010
    shwaip wrote:
    memtest, chkdsk?

    Sometimes chkdsk would run during the restart cycle. Below is what we got most recently:
    ______________________________________________________________________
    "The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume \Device\HarddiskVolume1."
    _________________________________________________________________________________


    Chkdsk has been run a few times. The following was the most recent chkdsk message:
    _________________________________________________________________________________
    Checking file system on C:<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
    The type of the file system is NTFS.<o:p></o:p>
    <o:p> </o:p>
    <o:p> </o:p>
    One of your disks needs to be checked for consistency. You<o:p></o:p>
    may cancel the disk check, but it is strongly recommended<o:p></o:p>
    that you continue.<o:p></o:p>
    Windows will now check the disk. <o:p></o:p>
    Index entry {CA0CB8D8-FA7C-43B5-BC5C-0576AA521238}_2.fh of index $I30 in file 0x7d89 points to unused file 0x700c.<o:p></o:p>
    Deleting index entry {CA0CB8D8-FA7C-43B5-BC5C-0576AA521238}_2.fh in index $I30 of file 32137.<o:p></o:p>
    Index entry {CA0CB~2.FH of index $I30 in file 0x7d89 points to unused file 0x700c.<o:p></o:p>
    Deleting index entry {CA0CB~2.FH in index $I30 of file 32137.<o:p></o:p>
    Cleaning up minor inconsistencies on the drive.<o:p></o:p>
    Cleaning up 703 unused index entries from index $SII of file 0x9.<o:p></o:p>
    Cleaning up 703 unused index entries from index $SDH of file 0x9.<o:p></o:p>
    Cleaning up 703 unused security descriptors.<o:p></o:p>
    CHKDSK discovered free space marked as allocated in the<o:p></o:p>
    master file table (MFT) bitmap.<o:p></o:p>
    CHKDSK discovered free space marked as allocated in the volume bitmap.<o:p></o:p>
    Windows has made corrections to the file system.<o:p></o:p>
    <o:p> </o:p>
    71644783 KB total disk space.<o:p></o:p>
    26161608 KB in 40372 files.<o:p></o:p>
    15064 KB in 8524 indexes.<o:p></o:p>
    0 KB in bad sectors.<o:p></o:p>
    85391 KB in use by the system.<o:p></o:p>
    23040 KB occupied by the log file.<o:p></o:p>
    45382720 KB available on disk.<o:p></o:p>
    <o:p> </o:p>
    4096 bytes in each allocation unit.<o:p></o:p>
    17911195 total allocation units on disk.<o:p></o:p>
    11345680 allocation units available on disk.<o:p></o:p>
    <o:p> </o:p>
    Internal Info:<o:p></o:p>
    20 c5 00 00 0b bf 00 00 44 17 01 00 00 00 00 00 .......D.......<o:p></o:p>
    8d 00 00 00 02 00 00 00 63 08 00 00 00 00 00 00 ........c.......<o:p></o:p>
    08 36 c3 03 00 00 00 00 10 d4 0f 11 00 00 00 00 .6..............<o:p></o:p>
    8e 8d b2 13 00 00 00 00 00 00 00 00 00 00 00 00 ................<o:p></o:p>
    00 00 00 00 00 00 00 00 4e 7a 02 30 00 00 00 00 ........Nz.0....<o:p></o:p>
    e0 9c f2 9e 00 00 00 00 ff ff ff ff 11 00 00 00 ................<o:p></o:p>
    b4 9d 00 00 00 00 00 00 00 20 c7 3c 06 00 00 00 ......... .<....<o:p></o:p>
    <o:p> </o:p>
    Windows has finished checking your disk.<o:p></o:p>
    Please wait while your computer restarts.
    ________________________________________________________________________________

    We haven't run a memtest on it; how long would it take to check 4GB of RAM? This is a heavy-duty server that has folks connecting to it potentially around-the-clock.

    Thanks.
  • RootWyrmRootWyrm Icrontian
    edited October 2010
    Your HP Smart Array drivers are probably out of date. The latest version should fix this issue. The corruption's a result of a crash in those drivers. The Insight Manager error is a red herring.
  • edited October 2010
    RootWyrm wrote:
    Your HP Smart Array drivers are probably out of date. The latest version should fix this issue. The corruption's a result of a crash in those drivers. The Insight Manager error is a red herring.


    Just had a reboot about 10 min. ago. What we saw on the BSOD was Bad_Pool_Caller with a stop error of 0x000000c2. Googled it & found: "'Stop 0x000000C2 BAD_POOL_CALLER' error message in Windows Server 2003." This has a hotfix that we tried to install, but we got a message saying the service pack (SP2) was newer than the hotfix, so the hotfix didn't need to be applied. Go figure. I'm going to download the latest Smart Array software from HP & see what happens with that.
  • edited December 2010
    mikepbmike wrote:
    Just had a reboot about 10 min. ago. What we saw on the BSOD was Bad_Pool_Caller with a stop error of 0x000000c2. Googled it & found: "'Stop 0x000000C2 BAD_POOL_CALLER' error message in Windows Server 2003." This has a hotfix that we tried to install, but we got a message saying the service pack (SP2) was newer than the hotfix, so the hotfix didn't need to be applied. Go figure. I'm going to download the latest Smart Array software from HP & see what happens with that.


    The issue has been resolved! I had to update the Support Pack, reboot, then do a Firmware update & reboot. The server has been running stable ever since without rebooting. Woof! Just wanted to pass what worked for me along.

    Thanks for all the help!

    mikePBmike
Sign In or Register to comment.