Discussion:
linux isn't robust enough to handle bad sector??
(too old to reply)
Long Wind
2020-09-20 02:40:01 UTC
Permalink
i'm creating FS on problem disk
though it has passed short and long tests by smart tooli meet bad sector, mkfs complains foreverCtrl+C can't kill it, what should i do NOW??
[ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2719.089171] ata4.00: failed command: WRITE DMA EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2719.089192] ata4.00: status: { DRDY }
[ 2719.089209] ata4: hard resetting link
[ 2719.404384] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2719.412162] ata4.00: configured for UDMA/33
[ 2719.412199] ata4: EH complete
[ 2749.813187] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2749.813201] ata4.00: failed command: WRITE DMA EXT
[ 2749.813208] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2749.813212] ata4.00: status: { DRDY }
Gene Heskett
2020-09-20 03:20:01 UTC
Permalink
Post by Long Wind
i'm creating FS on problem disk
though it has passed short and long tests by smart tooli meet bad
sector, mkfs complains foreverCtrl+C can't kill it, what should i do
NOW?? [ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen [ 2719.089171] ata4.00: failed command: WRITE DMA
EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0
dma 1048576 out res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout) [ 2719.089192] ata4.00: status: { DRDY }
[ 2719.089209] ata4: hard resetting link
[ 2719.404384] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2719.412162] ata4.00: configured for UDMA/33
[ 2719.412199] ata4: EH complete
[ 2749.813187] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
0x6 frozen [ 2749.813201] ata4.00: failed command: WRITE DMA EXT
[ 2749.813208] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0
dma 1048576 out res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout) [ 2749.813212] ata4.00: status: { DRDY }
This is a loooong shot, but try a new sata cable. Particularly if the
existing one is hot red and over 3 years old. Any color but hot red.
There is something in that plastic dye that destroys the copper in the
cable over time, first observed by this old service tech in the middle
1970's when that color showed up in japanese microphone cables for CB
radio's. Its been a ticking time bomb ever since.

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>
Dan Ritter
2020-09-20 04:00:01 UTC
Permalink
Post by Long Wind
i'm creating FS on problem disk
though it has passed short and long tests by smart tooli meet bad sector, mkfs complains foreverCtrl+C can't kill it, what should i do NOW??
[ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2719.089171] ata4.00: failed command: WRITE DMA EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
?????????????????????????????????????????????? res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2719.089192] ata4.00: status: { DRDY }
[ 2719.089209] ata4: hard resetting link
[ 2719.404384] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2719.412162] ata4.00: configured for UDMA/33
[ 2719.412199] ata4: EH complete
[ 2749.813187] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2749.813201] ata4.00: failed command: WRITE DMA EXT
[ 2749.813208] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
?????????????????????????????????????????????? res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2749.813212] ata4.00: status: { DRDY }
In order of most likely to least likely:

- replace the disk
- replace the cabling to the disk
- replace the disk controller
- replace the power supply

You can scan for bad blocks with the badblocks tool, but if you
have repeated problems like this, the disk is probably not reliable.

If you need your data to survive events like this later in the
disk's life, you will need backups, RAID, ZFS, or similar
measures.

-dsr-
Long Wind
2020-09-20 05:00:01 UTC
Permalink
Thank Gene and Dan! 
finally i have to unplug power cable to shutdowni've used problem disk on other computersit's probably not cable's faultit has 190G unused space, now i re-partition and
try luck on other location
Post by Long Wind
i'm creating FS on problem disk
though it has passed short and long tests by smart tooli meet bad sector, mkfs complains foreverCtrl+C can't kill it, what should i do NOW??
[ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2719.089171] ata4.00: failed command: WRITE DMA EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
?????????????????????????????????????????????? res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2719.089192] ata4.00: status: { DRDY }
[ 2719.089209] ata4: hard resetting link
[ 2719.404384] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2719.412162] ata4.00: configured for UDMA/33
[ 2719.412199] ata4: EH complete
[ 2749.813187] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2749.813201] ata4.00: failed command: WRITE DMA EXT
[ 2749.813208] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
?????????????????????????????????????????????? res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2749.813212] ata4.00: status: { DRDY }
In order of most likely to least likely:

- replace the disk
- replace the cabling to the disk
- replace the disk controller
- replace the power supply

You can scan for bad blocks with the badblocks tool, but if you
have repeated problems like this, the disk is probably not reliable.

If you need your data to survive events like this later in the
disk's life, you will need backups, RAID, ZFS, or similar
measures.

-dsr-
David Christensen
2020-09-20 06:20:01 UTC
Permalink
Post by Long Wind
i'm creating FS on problem disk
though it has passed short and long tests by smart tooli meet bad sector, mkfs complains foreverCtrl+C can't kill it, what should i do NOW??
[ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2719.089171] ata4.00: failed command: WRITE DMA EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2719.089192] ata4.00: status: { DRDY }
[ 2719.089209] ata4: hard resetting link
[ 2719.404384] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2719.412162] ata4.00: configured for UDMA/33
[ 2719.412199] ata4: EH complete
[ 2749.813187] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2749.813201] ata4.00: failed command: WRITE DMA EXT
[ 2749.813208] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2749.813212] ata4.00: status: { DRDY }
First, backup your data.


Please run the following command and post your complete console session
-- prompt, command, output. Substitute DISKID as appropriate:

# smartctl -x /dev/disk/by-id/DISKID


Test the drive with the manufacturer diagnostic utility. Understand
that current tools usually require Microsoft Windows. Run all available
tests. Wipe/ zero the entire drive. Test again. If the wipe completes
without errors and second round of tests complete without errors, put
the drive back into service.


If not, disconnect the power cable and SATA cable (at both ends).
Reconnect the cables and try again.


If not, replace the SATA cable with a new SATA 6 Gbps cable and try again.


If not, move the drive to another caddy and/or rack, re-plug cables, and
try again.


If not, connect the cable to a different port and try again.


If not, put the drive into another computer, re-plug cables, and try again.


If not, RMA or recycle the drive.


David
Long Wind
2020-09-20 08:00:01 UTC
Permalink
On Sunday, September 20, 2020, 2:15:21 PM GMT+8, David Christensen
First, backup your data.
Please run the following command and post your complete console session
-- prompt, command, output.  Substitute DISKID as appropriate:

    # smartctl -x /dev/disk/by-id/DISKID


Test the drive with the manufacturer diagnostic utility.  Understand
that current tools usually require Microsoft Windows.  Run all available
tests.  Wipe/ zero the entire drive.  Test again.  If the wipe completes
without errors and second round of tests complete without errors, put
the drive back into service.


If not, disconnect the power cable and SATA cable (at both ends).
Reconnect the cables and try again.


If not, replace the SATA cable with a new SATA 6 Gbps cable and try again.


Thank David! i changed data cable, it doesn't seem to helpi'd better throw it away, time i serve it is far more than it serve meand i doubt credibility of smart report and badblocks checkit nearly pass test by badblocks with -w option
/sbin/smartctl -x /dev/sdb
smartctl 6.6 2017-11-05 r4594 [i686-linux-4.19.0-10-686] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Pipeline HD 5900.2
Device Model: ST3320311CS
Serial Number: 6VV842DN
LU WWN Device Id: 5 000c50 032bd3f74
Firmware Version: SC13
User Capacity: 320,072,933,376 bytes [320 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s (current: 1.5 Gb/s)
Local Time is: Sun Sep 20 15:45:24 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unknown

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 653) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 88) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 117 099 006 - 152257666
3 Spin_Up_Time PO---- 097 097 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 685
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 080 060 030 - 116258178
9 Power_On_Hours -O--CK 096 096 000 - 4018
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 348
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 099 001 000 - 2242007139067
189 High_Fly_Writes -O-RCK 082 082 000 - 18
190 Airflow_Temperature_Cel -O---K 066 045 045 Past 34 (Min/Max 32/34)
194 Temperature_Celsius -O---K 034 055 000 - 34 (0 7 0 0 0)
195 Hardware_ECC_Recovered -O-RC- 060 056 000 - 152257666
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 196 000 - 89
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 GPL,SL R/O 1 Summary SMART error log
0x02 GPL,SL R/O 5 Comprehensive SMART error log
0x03 GPL,SL R/O 5 Ext. Comprehensive SMART error log
0x06 GPL,SL R/O 1 SMART self-test log
0x07 GPL,SL R/O 1 Extended self-test log
0x09 GPL,SL R/W 1 Selective self-test log
0x10 GPL,SL R/O 1 NCQ Command Error log
0x11 GPL,SL R/O 1 SATA Phy Event Counters log
0x21 GPL,SL R/O 1 Write stream error log
0x22 GPL,SL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 2248 Device vendor specific log
0xa8 GPL,SL VS 129 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xb0 GPL VS 2928 Device vendor specific log
0xbd GPL,SL VS 252 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4016 -
# 2 Short offline Completed without error 00% 4015 -
# 3 Extended offline Completed without error 00% 4002 -
# 4 Short offline Completed without error 00% 4001 -
# 5 Extended offline Completed without error 00% 4000 -
# 6 Short offline Completed without error 00% 3996 -
# 7 Short offline Completed without error 00% 3996 -
# 8 Short offline Completed without error 00% 3991 -
# 9 Short offline Completed without error 00% 3991 -
#10 Short offline Completed without error 00% 3990 -
#11 Short offline Completed without error 00% 3990 -
#12 Short offline Completed without error 00% 3990 -
#13 Short offline Completed without error 00% 3934 -
#14 Short offline Completed without error 00% 3900 -
#15 Extended offline Completed without error 00% 3891 -
#16 Short offline Completed without error 00% 3890 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 33 Celsius
Power Cycle Min/Max Temperature: 32/33 Celsius
Lifetime Min/Max Temperature: 7/55 Celsius
Under/Over Temperature Limit Count: 0/0

SCT Temperature History Version: 2
Temperature Sampling Period: 2 minutes
Temperature Logging Interval: 94 minutes
Min/Max recommended Temperature: 1/61 Celsius
Min/Max Temperature Limit: 2/60 Celsius
Temperature History Size (Index): 128 (119)

Index Estimated Time Temperature Celsius
120 2020-09-12 08:30 ? -
121 2020-09-12 10:04 30 ***********
122 2020-09-12 11:38 30 ***********
123 2020-09-12 13:12 34 ***************
124 2020-09-12 14:46 34 ***************
125 2020-09-12 16:20 33 **************
126 2020-09-12 17:54 ? -
127 2020-09-12 19:28 24 *****
0 2020-09-12 21:02 ? -
1 2020-09-12 22:36 27 ********
2 2020-09-13 00:10 27 ********
3 2020-09-13 01:44 ? -
4 2020-09-13 03:18 29 **********
5 2020-09-13 04:52 29 **********
6 2020-09-13 06:26 32 *************
7 2020-09-13 08:00 34 ***************
8 2020-09-13 09:34 ? -
9 2020-09-13 11:08 31 ************
10 2020-09-13 12:42 ? -
11 2020-09-13 14:16 32 *************
12 2020-09-13 15:50 32 *************
13 2020-09-13 17:24 34 ***************
14 2020-09-13 18:58 35 ****************
15 2020-09-13 20:32 ? -
16 2020-09-13 22:06 24 *****
17 2020-09-13 23:40 24 *****
18 2020-09-14 01:14 ? -
19 2020-09-14 02:48 30 ***********
20 2020-09-14 04:22 30 ***********
21 2020-09-14 05:56 34 ***************
22 2020-09-14 07:30 ? -
23 2020-09-14 09:04 29 **********
24 2020-09-14 10:38 29 **********
25 2020-09-14 12:12 33 **************
26 2020-09-14 13:46 32 *************
27 2020-09-14 15:20 33 **************
28 2020-09-14 16:54 33 **************
29 2020-09-14 18:28 ? -
30 2020-09-14 20:02 24 *****
31 2020-09-14 21:36 ? -
32 2020-09-14 23:10 24 *****
33 2020-09-15 00:44 24 *****
34 2020-09-15 02:18 ? -
35 2020-09-15 03:52 26 *******
36 2020-09-15 05:26 26 *******
37 2020-09-15 07:00 33 **************
38 2020-09-15 08:34 34 ***************
39 2020-09-15 10:08 ? -
40 2020-09-15 11:42 24 *****
41 2020-09-15 13:16 ? -
42 2020-09-15 14:50 25 ******
43 2020-09-15 16:24 ? -
44 2020-09-15 17:58 27 ********
45 2020-09-15 19:32 27 ********
46 2020-09-15 21:06 ? -
47 2020-09-15 22:40 27 ********
48 2020-09-16 00:14 27 ********
49 2020-09-16 01:48 34 ***************
50 2020-09-16 03:22 32 *************
51 2020-09-16 04:56 ? -
52 2020-09-16 06:30 24 *****
53 2020-09-16 08:04 ? -
54 2020-09-16 09:38 24 *****
55 2020-09-16 11:12 ? -
56 2020-09-16 12:46 31 ************
57 2020-09-16 14:20 ? -
58 2020-09-16 15:54 29 **********
59 2020-09-16 17:28 29 **********
60 2020-09-16 19:02 34 ***************
61 2020-09-16 20:36 ? -
62 2020-09-16 22:10 29 **********
63 2020-09-16 23:44 29 **********
64 2020-09-17 01:18 ? -
65 2020-09-17 02:52 25 ******
66 2020-09-17 04:26 25 ******
67 2020-09-17 06:00 33 **************
68 2020-09-17 07:34 ? -
69 2020-09-17 09:08 30 ***********
70 2020-09-17 10:42 ? -
71 2020-09-17 12:16 29 **********
72 2020-09-17 13:50 ? -
73 2020-09-17 15:24 29 **********
74 2020-09-17 16:58 ? -
75 2020-09-17 18:32 29 **********
76 2020-09-17 20:06 ? -
77 2020-09-17 21:40 24 *****
78 2020-09-17 23:14 ? -
79 2020-09-18 00:48 33 **************
80 2020-09-18 02:22 ? -
81 2020-09-18 03:56 35 ****************
82 2020-09-18 05:30 ? -
83 2020-09-18 07:04 24 *****
84 2020-09-18 08:38 24 *****
85 2020-09-18 10:12 34 ***************
86 2020-09-18 11:46 35 ****************
87 2020-09-18 13:20 ? -
88 2020-09-18 14:54 25 ******
89 2020-09-18 16:28 25 ******
90 2020-09-18 18:02 ? -
91 2020-09-18 19:36 24 *****
92 2020-09-18 21:10 ? -
93 2020-09-18 22:44 25 ******
94 2020-09-19 00:18 25 ******
95 2020-09-19 01:52 33 **************
96 2020-09-19 03:26 ? -
97 2020-09-19 05:00 25 ******
98 2020-09-19 06:34 25 ******
99 2020-09-19 08:08 32 *************
100 2020-09-19 09:42 31 ************
101 2020-09-19 11:16 32 *************
102 2020-09-19 12:50 ? -
103 2020-09-19 14:24 24 *****
104 2020-09-19 15:58 ? -
105 2020-09-19 17:32 27 ********
106 2020-09-19 19:06 27 ********
107 2020-09-19 20:40 34 ***************
108 2020-09-19 22:14 ? -
109 2020-09-19 23:48 24 *****
110 2020-09-20 01:22 24 *****
111 2020-09-20 02:56 34 ***************
112 2020-09-20 04:30 ? -
113 2020-09-20 06:04 34 ***************
114 2020-09-20 07:38 34 ***************
115 2020-09-20 09:12 ? -
116 2020-09-20 10:46 33 **************
117 2020-09-20 12:20 ? -
118 2020-09-20 13:54 32 *************
119 2020-09-20 15:28 32 *************

SCT Error Recovery Control:
Read: Disabled
Write: Disabled

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
David Christensen
2020-09-20 08:40:02 UTC
Permalink
Post by Long Wind
On Sunday, September 20, 2020, 2:15:21 PM GMT+8, David Christensen
First, backup your data.
Please run the following command and post your complete console session
    # smartctl -x /dev/disk/by-id/DISKID
Thank you for posting the smartctl report. I don't see any obvious
problems.
Post by Long Wind
Thank David! i changed data cable, it doesn't seem to helpi'd better throw it away, time i serve it is far more than it serve meand i doubt credibility of smart report and badblocks checkit nearly pass test by badblocks with -w option
That drive is SATA 3 Gbps. Are your SATA cables marked for 3 Gbps or
faster?
Post by Long Wind
/sbin/smartctl -x /dev/sdb
smartctl 6.6 2017-11-05 r4594 [i686-linux-4.19.0-10-686] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, http://www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Pipeline HD 5900.2
Device Model: ST3320311CS
Go through the test, wipe, and re-test process with Seatools Bootable:

https://www.seagate.com/support/downloads/seatools/


David
Reco
2020-09-20 08:50:01 UTC
Permalink
Hi.
Post by Long Wind
On Sunday, September 20, 2020, 2:15:21 PM GMT+8, David Christensen
First, backup your data.
Please run the following command and post your complete console session
    # smartctl -x /dev/disk/by-id/DISKID
Thank you for posting the smartctl report. I don't see any obvious problems.
I do. First, drive does not have any bad sectors,

197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0


Second,

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error 0

[1] helpfully states:

CRC error during data transfer
This is indicated by ICRC bit in the ERROR register and means that
corruption occurred during data transfer. Up to ATA/ATAPI-7, the
standard specifies that this bit is only applicable to UDMA transfers
but ATA/ATAPI-8 draft revision 1f says that the bit may be applicable to
multiword DMA and PIO.


So, it's either a damaged SATA cable, or a damaged SATA port on a
motherboard. But the drive in question is Seagate, and for *those*
there's only one proper solution - throw the thing out into the nearest
garbage bin, and buy a real drive (WD or Toshiba).


Reco

[1] https://www.kernel.org/doc/htmldocs/libata/ataExceptions.html
David Christensen
2020-09-21 04:00:02 UTC
Permalink
Post by Reco
Hi.
Hello. :-)
Post by Reco
Post by Long Wind
On Sunday, September 20, 2020, 2:15:21 PM GMT+8, David Christensen
First, backup your data.
Please run the following command and post your complete console session
    # smartctl -x /dev/disk/by-id/DISKID
Thank you for posting the smartctl report. I don't see any obvious problems.
I do. First, drive does not have any bad sectors,
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
Yes.
Post by Reco
Second,
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error 0
CRC error during data transfer
This is indicated by ICRC bit in the ERROR register and means that
corruption occurred during data transfer. Up to ATA/ATAPI-7, the
standard specifies that this bit is only applicable to UDMA transfers
but ATA/ATAPI-8 draft revision 1f says that the bit may be applicable to
multiword DMA and PIO.
So, it's either a damaged SATA cable, or a damaged SATA port on a
motherboard.
[1] https://www.kernel.org/doc/htmldocs/libata/ataExceptions.html
Without specialized test equipment, about all I can do is buy surplus
hardware and apply a process of elimination.


David
Reco
2020-09-21 05:30:01 UTC
Permalink
Hi.
Post by Reco
Second,
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error 0
CRC error during data transfer
This is indicated by ICRC bit in the ERROR register and means that
corruption occurred during data transfer. Up to ATA/ATAPI-7, the
standard specifies that this bit is only applicable to UDMA transfers
but ATA/ATAPI-8 draft revision 1f says that the bit may be applicable to
multiword DMA and PIO.
So, it's either a damaged SATA cable, or a damaged SATA port on a
motherboard.
Without specialized test equipment, about all I can do is buy surplus hardware and apply a process of elimination.
I'd do the same, starting with SATA cable (the cheapest part of the
equation).

Reco
Miles Fidelman
2020-09-21 22:20:01 UTC
Permalink
    Hi.
Hello.  :-)
      On Sunday, September 20, 2020, 2:15:21 PM GMT+8, David
Christensen
First, backup your data.
Please run the following command and post your complete console session
      # smartctl -x /dev/disk/by-id/DISKID
Thank you for posting the smartctl report.  I don't see any obvious
problems.
I do. First, drive does not have any bad sectors,
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
Yes.
I'd also do a smartctl -A and look at the raw read errors. Depending on
the disk manufacturer, that can be a good indicator that there's a
failing sector, and the drive is doing lots of reads before it can
actually access the data in a block or sector.
--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

Theory is when you know everything but nothing works.
Practice is when everything works but no one knows why.
In our lab, theory and practice are combined:
nothing works and no one knows why. ... unknown
Long Wind
2020-09-20 09:40:02 UTC
Permalink
On Sunday, September 20, 2020, 4:33:21 AM EDT, David Christensen <***@holgerdanske.com> wrote:

Thank you for posting the smartctl report.  I don't see any obvious
problems.

That drive is SATA 3 Gbps.  Are your SATA cables marked for 3 Gbps or
faster?

Go through the test, wipe, and re-test process with Seatools Bootable:

https://www.seagate.com/support/downloads/seatools/


Thank David and Reco!data cable isn't marked with 3Gbps, but it works fine with other 300G disk it's easy to buy sata2 data cable, but does mainboard support sata2?can't sata2 disk work compatibly in old pc that supports sata1 only?it's too bad if it can't.
i've left home and stay at other place and can't use problem disk for a long time.
deloptes
2020-09-20 12:30:01 UTC
Permalink
Post by Long Wind
Thank David and Reco!data cable isn't marked with 3Gbps, but it works fine
with other 300G disk it's easy to buy sata2 data cable, but does mainboard
support sata2?can't sata2 disk work compatibly in old pc that supports
sata1 only?it's too bad if it can't. i've left home and stay at other
place and can't use problem disk for a long time.
The selected mode for data transfer is 1.5 - lowest possible, which might
mean that you have old controller or cable issue or some kind of
incompatibility. Old motherboard is also an option.
As Roco said - Seagate have bad reputation. Nowday manufacturers do not test
hardware with older boards/controllers - no one has the time to test
extnsively, so it might be that it works on some and gives problems on
other board.
Long Wind
2020-09-22 04:20:01 UTC
Permalink
On Sunday, September 20, 2020, 4:33:21 PM GMT+8, David Christensen
Go through the test, wipe, and re-test process with Seatools Bootable:

https://www.seagate.com/support/downloads/seatools/


David
David Christensen
2020-09-22 06:50:01 UTC
Permalink
seagate's new tool is for Windows, i've not used Windows for long timei've read their guide for old tool, its function look like smart tooli'm afraid  it won't be helpful to my diagnosis effort
Seagate SeaTools Bootable is a USB flash drive live Linux distribution
with an app for testing Seagate products. Microsoft Windows is not
required:

https://www.seagate.com/support/downloads/seatools/
my hp dx5150 pc probably supports sata1 onlyand it work well with Western Digital sata3 320G diskbut seem to have trouble with Seagate problem diskit probably caused 2 linux installation failures
That is a vintage computer. But, I would expect it to work with the
Seagate ST3320311CS; if both are in proper working order.
problem disk seem to have less trouble  with my more modern ThinkCentrewhich supports sata2
Test the Seagate ST3320311CS using the ThinkCentre and Seagate SeaTools
Bootable.


David
Long Wind
2020-09-22 10:20:01 UTC
Permalink
On Tuesday, September 22, 2020, 2:45:30 AM EDT, David Christensen <***@holgerdanske.com> wrote:

Seagate SeaTools Bootable is a USB flash drive live Linux distribution
with an app for testing Seagate products.  Microsoft Windows is not
required:

https://www.seagate.com/support/downloads/seatools/

That is a vintage computer.  But, I would expect it to work with the
Seagate ST3320311CS; if both are in proper working order.

Test the Seagate ST3320311CS using the ThinkCentre and Seagate SeaTools
Bootable.
it has only exe file, no iso file, you need Windows
https://www.seagate.com/files/old-support-files/seatools/USBbootSetup-SeaToolsBootable.zip
and i have little reason to do more test. it has passed short and long tests by smart and badblocks tests. i've removed it from hp to thinkcentre, and easily created FS. its problem with hp won't be solved by tool from seagate. i'll just use it in thinkcentre.
Thank Reco for educational post on bad sector!i've just read it and found it instructive.
David Wright
2020-09-22 13:10:02 UTC
Permalink
Post by David Christensen
Seagate SeaTools Bootable is a USB flash drive live Linux distribution
with an app for testing Seagate products.  Microsoft Windows is not
https://www.seagate.com/support/downloads/seatools/
That is a vintage computer.  But, I would expect it to work with the
Seagate ST3320311CS; if both are in proper working order.
Test the Seagate ST3320311CS using the ThinkCentre and Seagate SeaTools
Bootable.
it has only exe file, no iso file, you need Windows
https://www.seagate.com/files/old-support-files/seatools/USBbootSetup-SeaToolsBootable.zip
and i have little reason to do more test. it has passed short and long tests by smart and badblocks tests. i've removed it from hp to thinkcentre, and easily created FS. its problem with hp won't be solved by tool from seagate. i'll just use it in thinkcentre.
Thank Reco for educational post on bad sector!i've just read it and found it instructive.
You picked the wrong file to download. Go back to the
https://www.seagate.com/support/downloads/seatools/
page and read the text in the corner of the browser
window as the mouse hovers over the big blobs.
The second blob down goes to
https://www.seagate.com/support/downloads/seatools/seatools-legacy-support-master/
Here you will find graphical and text versions that
are bootable ISOs which run under FreeDOS:
https://www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO
and
https://www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS112.ISO
with instructions:
https://www.seagate.com/files/staticfiles/support/downloads/seatools/seatools-dos-guide.pdf
Download those.

Cheers,
David.
David Christensen
2020-09-22 22:50:01 UTC
Permalink
Post by David Wright
Post by Long Wind
Post by David Christensen
https://www.seagate.com/support/downloads/seatools/
it has only exe file, no iso file, you need Windows
You picked the wrong file to download. Go back to the
https://www.seagate.com/support/downloads/seatools/
page and read the text in the corner of the browser
window as the mouse hovers over the big blobs.
The second blob down goes to
https://www.seagate.com/support/downloads/seatools/seatools-legacy-support-master/
Here you will find graphical and text versions that
https://www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO
and
https://www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS112.ISO
https://www.seagate.com/files/staticfiles/support/downloads/seatools/seatools-dos-guide.pdf
Download those.
Thank you for finding the old ISO versions (I recall using those in the
past). Given the age of the OP's computers, there is a god chance he
has an optical drive. Hopefully, it works.


David
David Christensen
2020-09-22 21:00:02 UTC
Permalink
Post by David Christensen
Seagate SeaTools Bootable is a USB flash drive live Linux distribution
with an app for testing Seagate products.  Microsoft Windows is not
https://www.seagate.com/support/downloads/seatools/
That is a vintage computer.  But, I would expect it to work with the
Seagate ST3320311CS; if both are in proper working order.
Test the Seagate ST3320311CS using the ThinkCentre and Seagate SeaTools
Bootable.
it has only exe file, no iso file, you need Windows
Sorry about that. I burned Seagate SeaTools Bootable to a USB flash
drive a few years ago and forgot the details.


I have sent a support request to Seagate asking them to post an *.img
file of SeaTools Bootable that can be copied to a USB flash drive using
Linux/ BSD/ Unix and dd(1).


That said, the harsh reality of running Linux, BSD, etc., on Intel x86/
x86-64 hardware is that you must maintain at least one working Windows
installation.


I bought a Windows 7 Pro COA from a recycled laptop on eBay for $25.
These are still available.


I am disinclined to set up a Windows 10 installation as it involves
joining the Microsoft collective. I seem to recall reading a post that
there is a way to bypass assimilation -- I will need to research it when
Windows 7 no longer meets my needs.


Perhaps you should use the Seagate ST3320311CS for Windows.
Post by David Christensen
https://www.seagate.com/files/old-support-files/seatools/USBbootSetup-SeaToolsBootable.zip
and i have little reason to do more test. it has passed short and long tests by smart and badblocks tests. i've removed it from hp to thinkcentre,
So, the drive passed all smartctl(8) and badblocks(8) tests in the HP,
the ThinkCentre, or both?


Have you purchased and installed new SATA 6 Gbps cables?
Post by David Christensen
and easily created FS. its problem with hp won't be solved by tool from seagate. i'll just use it in thinkcentre.
Be sure to use a file system that can detect bit rot -- either btrfs or
ZFS -- and back up frequently.


David
Anssi Saari
2020-09-23 07:20:01 UTC
Permalink
Post by David Christensen
Test the Seagate ST3320311CS using the ThinkCentre and Seagate SeaTools
Bootable.
it has only exe file, no iso file, you need Windows
https://www.seagate.com/files/old-support-files/seatools/USBbootSetup-SeaToolsBootable.zip
Actually 7zip can extract the files from that in Linux. So it might be
possible to make a USB stick out of this with some work in Linux. Might
be as simple as copying the extracted files to the stick and installing
syslinux on it.

Then again, as UBCD for example includes this (well, SeaTools for DOS),
why not just download that and have a USB stick with a lot of other
utilities too? Or use the ISO for Seatools for DOS?

Reco
2020-09-20 08:00:01 UTC
Permalink
Hi.
Post by Long Wind
i'm creating FS on problem disk
though it has passed short and long tests by smart tooli meet bad sector, mkfs complains foreverCtrl+C can't kill it, what should i do NOW??
[ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2719.089171] ata4.00: failed command: WRITE DMA EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2719.089192] ata4.00: status: { DRDY }
That's not a "bad sector". This is a "bad sector":

Jan 29 07:41:01 xxx kernel: [5687751.356991] sd 0:x:x:x: [sdx] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 29 07:41:01 xxx kernel: [5687751.357016] sd 0:x:x:x: [sdx] tag#0 Sense Key : Medium Error [current]
Jan 29 07:41:01 xxx kernel: [5687751.357024] sd 0:x:x:x: [sdx] tag#0 Add. Sense: Unrecovered read error
Jan 29 07:41:01 xxx kernel: [5687751.357030] sd 0:x:x:x: [sdx] tag#0 CDB: Read(10) 28 00 01 0e f0 80 00 00 80 00
Jan 29 07:41:01 xxx kernel: [5687751.357036] blk_update_request: critical medium error, dev sdx, sector 17756288

And you can easily tell one from another. Yours is saying "timeout".
Mine's saying "critical medium error".

What you're seeing is your drive's firmware is hanging on processing a
certain SCSI command. Could be the drive itself, its firmware, or (as
other helpfully suggested) - the SATA cable,

To answer your question - Linux can handle bad sectors just fine. It's
the failing hardware (especially consumer-quality failing hardware) that
it has difficulties with.

Reco
Miles Fidelman
2020-09-20 16:30:01 UTC
Permalink
Post by Long Wind
i'm creating FS on problem disk
though it has passed short and long tests by smart tool
i meet bad sector, mkfs complains forever
Ctrl+C can't kill it, what should i do NOW??
[ 2719.089156] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2719.089171] ata4.00: failed command: WRITE DMA EXT
[ 2719.089185] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 2719.089192] ata4.00: status: { DRDY }
[ 2719.089209] ata4: hard resetting link
[ 2719.404384] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2719.412162] ata4.00: configured for UDMA/33
[ 2719.412199] ata4: EH complete
[ 2749.813187] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2749.813201] ata4.00: failed command: WRITE DMA EXT
[ 2749.813208] ata4.00: cmd 35/00:00:18:68:e1/00:08:0c:00:00/e0 tag 0 dma 1048576 out
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 2749.813212] ata4.00: status: { DRDY }
One thing to check is whether part of what's happening is driven by the
onboard disk driver.  Consumer grade drives try very hard to read bad
blocks, leading to very long timeouts that drag a machine to its knees. 
(Learned this the hard way, trying to figure out why a server, with
raided disks, just kept going slower... and slower... and slower.)

By contrast, server-grade drives just give up after the first try -
letting RAID do its thing.

You might want to check the specs on your drive, and run a deep set of
diagnostics, starting with the more intrusive smart diagnostics.

Miles Fidelman
--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

Theory is when you know everything but nothing works.
Practice is when everything works but no one knows why.
In our lab, theory and practice are combined:
nothing works and no one knows why. ... unknown
Loading...