Windows Server Forum / Virtual Server / July 2008
Problem with Hyper-V, NCQ SATA drives; Event ID 129 from nvstor64 saying "Reset to device, \Device\RaidPort0, was issued".
|
|
Thread rating:  |
Bruce Sanderson - 19 Jul 2008 21:47 GMT This post is related to the one on 5 Jun 2008, but I think I asked the wrong question in that one!
The operating system is Windows Server 2008 RTM with the RTM version of the Hyper-V role installed (Windows6.0-KB950050-x64.msu).
The computer is a custom built with an ASUS P5N-D motherboard, which has the NVIDIA nForce 750i SLI chipset.
This computer has three SATA drives - two ST3320620AS and one ST3500320AS. The ST3500320AS has two partitions, one of which is the Windows "System" and "Boot" partition (hosts the operating system).
If command queuing is enabled on the ST3500320AS, I get frequent System Event Log entries with Event ID 129 from nvstor64 saying "Reset to device, \Device\RaidPort0, was issued".
When these Event Log entries are recorded, the system temporarily freezes - no response to mouse or keyboard, no video updates - everything stops for a few seconds then carries on as if nothing happened.
Enabling command queuing on either or both of the ST3320620AS drives does NOT cause this problem.
The ASUS web site says that problems with NCQ has been reported with the NVIDIA nForce 750i SLI chipset and the solution is to update the firmware in the drive.
Does anyone know where can I get updated firmware for the ST3500320AS or have any other clue to resolving this problem?
I've disabled commmand queuing on the ST3500320AS, the system works and I don't get the Event log entries (Event ID 129 from nvsotr64) or system freezes, but I suspect this is resulting in degraded disk (and thus system) performance.
 Signature Bruce Sanderson http://members.shaw.ca/bsanders
It is perfectly useless to know the right answer to the wrong question.
Meinolf Weber - 19 Jul 2008 22:10 GMT Hello Bruce,
Well, this question better post better to seagate. If you know it's a firmware problem, that's there task not MS.
Best regards
Meinolf Weber Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights. ** Please do NOT email, only reply to Newsgroups ** HELP us help YOU!!! http://www.blakjak.demon.co.uk/mul_crss.htm
> This post is related to the one on 5 Jun 2008, but I think I asked the > wrong question in that one! [quoted text clipped - 35 lines] > It is perfectly useless to know the right answer to the wrong > question. Charlie Russel - MVP - 19 Jul 2008 22:11 GMT GIYF.
http://www.seagate.com/www/en-us/support/downloads/other_downloads/cuda-fw
First hit on: ST3320320AS firmware site:seagate.com
In general, firmware updates are only available directly from the OEM involved. So, for the Seagate drives, you go to the Seagate site.
As for actual performance of the drive with and without NCQ - most of the reviews I've seen so far haven't shown a significant benefit to NCQ enablement in benchmark tests. Real world, of course, is not a benchmark, so YMMV.
 Signature Charlie.
> This post is related to the one on 5 Jun 2008, but I think I asked the > wrong question in that one! [quoted text clipped - 31 lines] > freezes, but I suspect this is resulting in degraded disk (and thus > system) performance. Bruce Sanderson - 20 Jul 2008 00:06 GMT Hello Charlie - good to hear from you! Thanks for your post.
I have reported the problem to Seagate - no reply yet.
Thanks for the hyperlink. Your searching skills must be far superior to mine becuase my searching on the Seagate site did not find it!
The page you reference specifically says that for the model ST3500320AS, which is triggering my problem, "no action is required".
I did find several posts on the Seagate site from people that had "updated" the firware from SD15 to AD14 and now the drive doesn't work. I pulled my drive and found that it does indeed have firmware version SD15, so I don't think I'll be updating the firmware!
The ST3320620AS drives that do NOT cause the problem are Barracuda 7200.10
The ST3500320AS that DOES cause the problem is Barracuda 7200.11.
As to whether and how much performance degredation turning off command queuing introduces I really don't know. I do notice that Resource Monitor tells me the drive is "100%" busy when the Virtual Machines are busy sometimes, but what difference the command queuing would make is hard to say.
I also found a thread on the Seagate "message board" that someone is having what appears to be a related problem with Vista 64 bit and a Seagate 7200.11 drive (http://forums.seagate.com/stx/board/message?board.id=ata_drives&message.id=1916& jump=true#M1916). I suggested they try disabling command queuing.
 Signature Bruce Sanderson http://members.shaw.ca/bsanders
It is perfectly useless to know the right answer to the wrong question.
> GIYF. > [quoted text clipped - 45 lines] >> freezes, but I suspect this is resulting in degraded disk (and thus >> system) performance. Edwin vMierlo [MVP] - 20 Jul 2008 09:28 GMT > This post is related to the one on 5 Jun 2008, but I think I asked the wrong > question in that one! [quoted text clipped - 12 lines] > Event Log entries with Event ID 129 from nvstor64 saying "Reset to device, > \Device\RaidPort0, was issued". Can you post the full 129 event ? (use copy symbol, including the Hex data at the end)
Thanks, Edwin
Bruce Sanderson - 21 Jul 2008 00:45 GMT Event Log entry as requested
Log Name: System Source: nvstor64 Date: 19-Jul-2008 10:40:35 AM Event ID: 129 Task Category: None Level: Warning Keywords: Classic User: N/A Computer: disc2008HV.Discovery.sanderson Description: Reset to device, \Device\RaidPort0, was issued. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="nvstor64" /> <EventID Qualifiers="32772">129</EventID> <Level>3</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2008-07-19T17:40:35.935Z" /> <EventRecordID>47825</EventRecordID> <Channel>System</Channel> <Computer>disc2008HV.Discovery.sanderson</Computer> <Security /> </System> <EventData> <Data>\Device\RaidPort0</Data> <Binary>0F001800010000000000000081000480040000000000000000000000000000000000000000000000000000000000000000000000810004800000000000000000</Binary> </EventData> </Event>
 Signature Bruce Sanderson http://members.shaw.ca/bsanders
It is perfectly useless to know the right answer to the wrong question.
>> This post is related to the one on 5 Jun 2008, but I think I asked the > wrong [quoted text clipped - 24 lines] > Thanks, > Edwin Edwin vMierlo [MVP] - 21 Jul 2008 09:12 GMT In the "Binary" which is in really a string of hex (thanks Microsoft ! great naming !) you can see the error code
# for hex 0x80040081 / decimal -2147221375 IO_WARNING_RESET # as an HRESULT: Severity: FAILURE (1), FACILITY_ITF (0x4), Code 0x81
further more the Bus Target Lun is all set to 00 00 00.
So, do you have a device on bus=0 target=0 lun=0 ? If so, that is the device which had a timeout, contact your storage/HBA vendor to continue investigations
an event 129 is generated by storport.sys (Microsoft driver) which has a timeout on the lower level HBA driver, but reported in event log as from the HBA driver. In this case the storport.sys had a timeout, and issues a reset.
> Event Log entry as requested > [quoted text clipped - 25 lines] > <EventData> > <Data>\Device\RaidPort0</Data> <Binary>0F001800010000000000000081000480040000000000000000000000000000000000 000000000000000000000000000000000000810004800000000000000000</Binary>
> </EventData> > </Event> [quoted text clipped - 27 lines] > > Thanks, > > Edwin Bruce Sanderson - 23 Jul 2008 03:52 GMT Edwin - thank you for your interest and response.
I'm aware that not every problem has a solution, problems have to be prioritized and maybe this one is pretty low on the list, but perhaps someone reading this newsgroup knows someone in Microsoft, NVIDIA or Seagate that might be interested in pursuing it.
This computer is in a very small, test/experimental environment, so there is definitly nothing "mission critical" here, but I'm always interested in learnig about how things work; investigating problems is often quite enlightning, particularly in these newsgroups! The computer has an Intel Quad core Q6600 and 8 GB RAM, so it runs Windows Server 2008 with Hyper-V quite well.
I've reported the problem to ASUS, NVIDIA and Seagate. ASUS say the don't support Windows Server on this motherboard (P5N-D). No response from Seagate or NVIDIA yet.
Here's some additional information/clarification.
1. there are no SCSI or Fibre Channel devices in this system. All of the drives are directly connected to the NVIDIA SATA controller on the motherboard. So, there are no "HBA"s as understand the term.
2. Here's the configuration as reported by Device Manager using the Devices by Connection View a. PCI bus i. NVIDIA nForce Serial ATA Controller: Properties, Location: PCI Bus 0, device 14, function 0 a) Port 0 ST3320620AS 1) [ST332062 0AS SCSI Disk Drive - Properties - Location: Bus Number 1, Target Id 1, LUN 0] b) Port 1 ST3500320AS 1) [ST350032 0AS SCSI Disk Drive - Properties - Location: Bus Number 0, Target Id 0, LUN 0]
ii. NVIDIA nForce Serial ATA Controller: Properties, Location: PCI Bus 0, device 15, function 0 a) Port 0 HL-DT-ST DVDRAM GH20NS10 1) [HL-DT-ST DVDRAM GH20NS10 SCSI CdRom Device - Properties - Location: Bus Number 0, Target Id 0, LUN 0] b) Port 1 ST3500320AS 1) [ST350032 0AS SCSI Disk Drive Properties - Location: Bus Number 1, Target Id 1, LUN 0]
3. I don't know why Windows Server 2008 Device Manager reports these devices as "SCSI" instead of "SATA". I have exactly the same motherboard (ASUS P5N-D) in another computer that is running Vista 64 bit SP1 and Device Manager on that system also reports the SATA drives as "SCSI".
3. I've determined by experiment that the only device which causes the 129 Event Log entries when it's "Port" (2.a.i.b) has command queueing enabled is the ST3500320AS drive (2.a.i.b)1)). Enabling command queueing on the other two disk drives DOES NOT cauase the 129 Event entries.
4. The NVIDIA site has a KB artice (http://nvidia.custhelp.com/cgi-bin/nvidia.cfg/php/enduser/std_alp.php article # 768) that indicates some problems with NCQ with SATA drives with a different chipset (680i). The remedy there is to update the firmware on the drive. Charlie pointed me to page on the Seagate site about firmware updates, but that page specifically says that the updated do not apply to this particular drive (ST3500320AS). There are posts on the Seagate site where some people have non-the-less attempted to "update" the firmware (from SD15 to AD14) and essentially ruined the drive. So, this does not appear to be a solution to this particular problem.
5. Charlie points out that any performance improvement from command queueing is likely to be marginal, so I'm willing to live with command queuing disabled.
Thoughts anyone?
 Signature Bruce Sanderson http://members.shaw.ca/bsanders
It is perfectly useless to know the right answer to the wrong question.
> In the "Binary" which is in really a string of hex (thanks Microsoft ! > great [quoted text clipped - 84 lines] >> > Thanks, >> > Edwin Edwin vMierlo [MVP] - 25 Jul 2008 14:53 GMT in line
> Edwin - thank you for your interest and response. > [quoted text clipped - 31 lines] > 1) [ST350032 0AS SCSI Disk Drive - Properties - Location: Bus > Number 0, Target Id 0, LUN 0] This is the one with B,T,L=0,0,0 -- this would be my suspect based on the binary data of the event (see my previous post).
> ii. NVIDIA nForce Serial ATA Controller: Properties, Location: PCI Bus > 0, device 15, function 0 > a) Port 0 HL-DT-ST DVDRAM GH20NS10 > 1) [HL-DT-ST DVDRAM GH20NS10 SCSI CdRom Device - Properties - > Location: Bus Number 0, Target Id 0, LUN 0] This one has also a B,T,L=0,0,0 -- because this is a CDrom device, this would not be my suspect
> b) Port 1 ST3500320AS > 1) [ST350032 0AS SCSI Disk Drive Properties - Location: Bus [quoted text clipped - 4 lines] > P5N-D) in another computer that is running Vista 64 bit SP1 and Device > Manager on that system also reports the SATA drives as "SCSI". Interesting, did not notice this until now, must keep an eye out
> 3. I've determined by experiment that the only device which causes the 129 > Event Log entries when it's "Port" (2.a.i.b) has command queueing enabled is > the ST3500320AS drive (2.a.i.b)1)). Enabling command queueing on the other > two disk drives DOES NOT cauase the 129 Event entries. ah ! good info, definitely worth while relaying to HD support (seagate) Although on the website it details that this drive supports NCQ.
> 4. The NVIDIA site has a KB artice > (http://nvidia.custhelp.com/cgi-bin/nvidia.cfg/php/enduser/std_alp.php [quoted text clipped - 10 lines] > is likely to be marginal, so I'm willing to live with command queuing > disabled. Back to your original remark "This computer is in a very small, test/experimental environment" you need to ask yourself the question if it is worthwhile pursuing
Bruce Sanderson - 25 Jul 2008 18:39 GMT Thanks for the info, Edwin. NVIDIA says talk to ASUS; ASUS says "2008 not supported"; no response from Seagate yet.
I don't think I will do any more on this, just live with it!
 Signature Bruce Sanderson http://members.shaw.ca/bsanders/ It's perfectly useless to know the right answer to the wrong question.
> in line > [quoted text clipped - 96 lines] > test/experimental environment" you need to ask yourself the question if it > is worthwhile pursuing
|
|
|