Skip to content

zongwu's blog

Linux USB3.0 接移动硬盘频繁卡死问题解决方法

问题

rockpi (系统 ubuntu 20.4)上通过USB3.0连接大容量硬盘,读取数据的时候,会频繁导致系统卡死,只能重启机器。

查看到的一些日志信息:

[  529.728684] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  529.729554] xhci-hcd xhci-hcd.9.auto: @00000000db5ca2b0 00000000 00000000 1b000000 1a078001
[  533.658284] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  533.659159] xhci-hcd xhci-hcd.9.auto: @00000000db5ca700 00000000 00000000 1b000000 18078001
[  536.888700] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  536.889570] xhci-hcd xhci-hcd.9.auto: @00000000db5ca340 00000000 00000000 1b000000 1a078001
[  547.392564] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  547.393418] xhci-hcd xhci-hcd.9.auto: @00000000db5ca990 00000000 00000000 1b000000 18078000
[  551.313637] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  551.314494] xhci-hcd xhci-hcd.9.auto: @00000000db5caf50 00000000 00000000 1b000000 18078000
[  567.382210] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  567.383086] xhci-hcd xhci-hcd.9.auto: @00000000db5cab30 00000000 00000000 1b000000 18078000
[  581.136868] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  581.137739] xhci-hcd xhci-hcd.9.auto: @00000000db5cae50 00000000 00000000 1b000000 18078000
[  585.063728] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  585.064582] xhci-hcd xhci-hcd.9.auto: @00000000db5ca410 00000000 00000000 1b000000 18078001
[  598.768816] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  598.769687] xhci-hcd xhci-hcd.9.auto: @00000000db5caa20 00000000 00000000 1b000000 18078001
[  602.693613] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  602.694491] xhci-hcd xhci-hcd.9.auto: @00000000db5cafc0 00000000 00000000 1b000000 18078001
[  616.468609] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  616.469481] xhci-hcd xhci-hcd.9.auto: @00000000db5caf50 00000000 00000000 1b000000 18078001
[  620.399073] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  620.399952] xhci-hcd xhci-hcd.9.auto: @00000000db5ca510 00000000 00000000 1b000000 18078000
[  634.432399] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  634.433254] xhci-hcd xhci-hcd.9.auto: @00000000db5cab20 00000000 00000000 1b000000 18078000
[  638.353145] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  638.353998] xhci-hcd xhci-hcd.9.auto: @00000000db5ca0e0 00000000 00000000 1b000000 18078001
[  654.242082] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  654.242945] xhci-hcd xhci-hcd.9.auto: @00000000db5ca140 00000000 00000000 1b000000 18078000
[  684.977605] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  684.978471] xhci-hcd xhci-hcd.9.auto: @00000000db5cad50 00000000 00000000 1b000000 18078000
[  684.979676] xhci-hcd xhci-hcd.9.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  684.980530] xhci-hcd xhci-hcd.9.auto: @00000000db5cad70 00000000 00000000 1b000000 18058000
[  690.469456] usb 6-1.1.1: device not accepting address 59, error -71
[  762.191000] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring
[  762.191858] xhci-hcd xhci-hcd.8.auto: @00000000eeb957d0 00000000 00000000 1b000000 11078001

然后会导致 Kernel panic 的严重错误:

[  762.907283] blk_update_request: I/O error, dev sdl, sector 6511155816
[  840.425331] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[  840.425933]       Not tainted 4.4.154-112-rockchip-gfdb18c8bab17 #1
[  840.426499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  840.427479] Kernel panic - not syncing: hung_task: blocked tasks
[  840.428026] CPU: 4 PID: 39 Comm: khungtaskd Not tainted 4.4.154-112-rockchip-gfdb18c8bab17 #1
[  840.428782] Hardware name: ROCK PI 4B (DT)
[  840.429153] Call trace:
[  840.429387] [<ffffff80080888d8>] dump_backtrace+0x0/0x220
[  840.429872] [<ffffff8008088b1c>] show_stack+0x24/0x30
[  840.430334] [<ffffff800856ebec>] dump_stack+0x98/0xc0
[  840.430798] [<ffffff80081724bc>] panic+0xe8/0x23c
[  840.431226] [<ffffff8008139e90>] proc_dohung_task_timeout_secs+0x0/0x7c
[  840.431817] [<ffffff80080ba310>] kthread+0xe0/0xf0
[  840.432252] [<ffffff8008082ef0>] ret_from_fork+0x10/0x20
[  840.432748] CPU0: stopping
[  840.433037] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.154-112-rockchip-gfdb18c8bab17 #1
[  840.433784] Hardware name: ROCK PI 4B (DT)
[  840.434157] Call trace:
[  840.434413] [<ffffff80080888d8>] dump_backtrace+0x0/0x220
[  840.434913] [<ffffff8008088b1c>] show_stack+0x24/0x30
[  840.435377] [<ffffff800856ebec>] dump_stack+0x98/0xc0
[  840.435841] [<ffffff800808db74>] handle_IPI+0x1d0/0x248
[  840.436313] [<ffffff8008080f24>] gic_handle_irq+0x17c/0x180
[  840.436818] Exception stack(0xffffff8009243d70 to 0xffffff8009243ea0)

有网友也遇到类似的问题:

这里^1,还有这里^2 。在第二个链接里作者推测是系统上的USB相关驱动的问题。

暂时的 workaround 是禁用 UAS 内核驱动。代价是读写速度的下降。

UAS

uasUSB Attached Storage ,使用的协议是 USB MSC USB Attached SCSI Protocol

关于usb 协议相关可以看^3

查看系统内核当前加载的模块:

$ lsmod
Module                  Size  Used by
xt_conntrack           16384  1
ipt_MASQUERADE         16384  1
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
nf_conntrack_netlink    36864  0
xt_addrtype            16384  2
iptable_filter         16384  1
iptable_nat            16384  1
nf_conntrack_ipv4      24576  2
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
nf_nat                 20480  2 nf_nat_ipv4,nf_nat_masquerade_ipv4
nf_conntrack          126976  6 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4
overlay                45056  1
binfmt_misc            20480  1
uas                    20480  0
usb_storage            61440  15 uas
bcmdhd               1183744  0
autofs4                40960  3

可以看到 uas module

查看系统的 usb 设备:

$ lsusb -t
/:  Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/:  Bus 07.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 1: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 5, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 3: Dev 10, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 4: Dev 13, If 0, Class=Mass Storage, Driver=uas, 5000M
        |__ Port 2: Dev 4, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 8, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 2: Dev 11, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 3: Dev 14, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 4: Dev 17, If 0, Class=Mass Storage, Driver=uas, 5000M
        |__ Port 3: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 22, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 2: Dev 12, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 3: Dev 16, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 4: Dev 23, If 0, Class=Mass Storage, Driver=uas, 5000M
        |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 15, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 2: Dev 18, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 3: Dev 19, If 0, Class=Mass Storage, Driver=uas, 5000M
            |__ Port 4: Dev 20, If 0, Class=Mass Storage, Driver=uas, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M

其中有很多 Class=Mass Storage, Driver=uas, 5000M 就是使用了 uas 作为驱动。

可以选择在系统中完全禁用掉uas

$ sudo vim  /etc/modprobe.d/blacklist.conf

在最后添加

blacklist uas

保存然后重启机器。

但是,测试发现这么操作会导致上面使用了uas的设备直接找不到驱动无法正常工作。

我们需要为这些设备直接指定使用更基础的 usb-storage 模块作为驱动,同时禁用掉uas

先获取设备的 idVendor:idProduct

$ lsusb  | awk '{print $6}' | sort -u
05e3:0610
05e3:0626
152d:0578
152d:9561
1d6b:0001
1d6b:0002
1d6b:0003

/etc/modprobe.d/ 目录下添加一个文件disable-uas.conf (名字可以任意定)

$ sudo vim /etc/modprobe.d/disable-uas.conf

添加:

options usb-storage quirks=05e3:0610:u,05e3:0626:u,152d:0578:u,1d6b:0001:u,1d6b:0002:u,1d6b:0003:u

然后输入

sudo update-initramfs -u
sudo reboot

重启机器。

再通过 lsusb -t 查看设备:

/:  Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/:  Bus 07.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 1: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 3: Dev 10, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 4: Dev 13, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 2: Dev 4, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 8, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 2: Dev 11, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 3: Dev 14, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 4: Dev 17, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 3: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 22, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 2: Dev 12, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 3: Dev 16, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 4: Dev 23, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 15, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 2: Dev 18, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 3: Dev 19, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
            |__ Port 4: Dev 20, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M

之前使用 uas 驱动设备都使用了usb-storage

通过dmesg | grep 'scsi host' -A5 -B5 查看系统日志也可以看出:

[    5.434458] usb 6-1.1.3: Manufacturer: BIAZE
[    5.434464] usb 6-1.1.3: SerialNumber: 000000000799
[    5.463418] usb 6-1.1.1: UAS is blacklisted for this device, using usb-storage instead
[    5.463436] usb-storage 6-1.1.1:1.0: USB Mass Storage device detected
[    5.465385] usb-storage 6-1.1.1:1.0: Quirks match for vid 152d pid 9561: 800000
[    5.469635] scsi host0: usb-storage 6-1.1.1:1.0
[    5.470394] usb 6-1.1.2: UAS is blacklisted for this device, using usb-storage instead
[    5.470410] usb-storage 6-1.1.2:1.0: USB Mass Storage device detected
[    5.473329] usb-storage 6-1.1.2:1.0: Quirks match for vid 152d pid 9561: 800000
...
...

该设备禁用了uas ,使用usb-storage

到此已经解决问题。对于其他不同的操作系统比如 Linux on a Raspberry Pi,如果上述方法不生效,可以试一试下面的配置:

$ sudo vim /boot/cmdline.txt

添加:

usb-storage quirks=05e3:0610:u,05e3:0626:u,152d:0578:u,152d:9561:u,1d6b:0001:u,1d6b:0002:u,1d6b:0003:u

然后重启机器。

参考

1 https://forum.pine64.org/showthread.php?tid=5832

2 https://forum.pine64.org/showthread.php?tid=5137

3 https://www.crifan.com/files/doc/docbook/usb_disk_driver/release/html/usb_disk_driver.html?spm=a2c6h.12873639.0.0.4f412263nKhXtS#ch02_msc_basic

4 https://leo.leung.xyz/wiki/Disable_UAS

5 https://askubuntu.com/questions/1266804/blacklist-uas-drivers-in-kernel

6 https://iitii.github.io/2019/08/05/1/

7 https://www.smartmontools.org/wiki/SAT-with-UAS-Linux