[virtuozzo 4.7.0-475.x86_64] Kernel panic - nfsd_inetaddr_event+0x68/0xa0 [nfsd]

Discussion in 'General Discussion' started by burnley, Aug 7, 2016.

  1. burnley

    burnley Kilo Poster

    Messages:
    95
    On the weekend we've got a pretty nasty crash on one of our VZ 4.7.0 nodes. The panic is (was at the time, at least for me (c) :D) 100% reproducible, here are the steps:
    1. Initial crash, no backtrace.
    2. After rebooting the node, all the services on the node are coming up properly, up to the point where the containers are started. As soon as the first container starts, panic.
    3. Rinse and repeat step 2.
    I've nailed it down to the NFS service starting *before* the 3 VZ containers are started. To avoid the kernel panic I had to disable NFS server automatic start by running "chkconfig --level 2345 nfs off" and start it manually after all the containers are aup and running.

    uname -a
    Linux vz-node 2.6.32-042stab117.10 #1 SMP Fri Jul 29 23:55:56 MSK 2016 x86_64 x86_64 x86_64 GNU/Linux

    Stacktrace follows:
    [...]
    Aug 6 20:46:32 vz-node kernel: [ 657.370188] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    Aug 6 20:46:32 vz-node kernel: [ 657.370981] IP: [<ffffffffa04bf998>] nfsd_inetaddr_event+0x68/0xa0 [nfsd]
    Aug 6 20:46:32 vz-node kernel: [ 657.371441] PGD edc904067 PUD f3a203067 PMD 0
    Aug 6 20:46:32 vz-node kernel: [ 657.372050] Oops: 0000 [#1] SMP
    Aug 6 20:46:32 vz-node kernel: [ 657.372566] last sysfs file: /sys/devices/virtual/net/venet0/address
    Aug 6 20:46:32 vz-node kernel: [ 657.372922] CPU 15
    Aug 6 20:46:32 vz-node kernel: [ 657.373016] Modules linked in: ip_vzredir(P)(U) vzredir(P)(U) vzcompat(P)(U) vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vznetdev ip6_vzprivnet(P)(U) ip6_vzredir(P)(U) ip6_vznetstat(P)(U) ip_vzprivnet
    (P)(U) vziolimit vzsnap(P)(U) vzfs(P)(U) vzcpt vzlinkdev(P)(U) vzethdev vzevent vzlist(P)(U) vzstat(P)(U) vzmon ip_vznetstat(P)(U) vznetstat(P)(U) vzdquota vzdev xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle xt_multiport xt_limit xt
    _dscp ipt_REJECT iptable_filter ip_tables nfsd coretemp nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 tun ipmi_devintf iTCO_wdt iTC
    O_vendor_support dcdbas power_meter acpi_ipmi ipmi_si ipmi_msghandler joydev sb_edac edac_core lpc_ich mfd_core shpchp igb i2c_algo_bit i2c_core sg ixgbe dca ptp pps_core mdio tcp_htcp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahc
    i megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_vzp
    Aug 6 20:46:32 vz-node kernel: rivnet]
    Aug 6 20:46:32 vz-node kernel: [ 657.386398]
    Aug 6 20:46:32 vz-node kernel: [ 657.386741] Pid: 8497, comm: ifconfig veid: 206 Tainted: P -- ------------ 2.6.32-042stab117.10 #1 042stab117_9 Dell Inc. PowerEdge R620/0PXXHP
    Aug 6 20:46:32 vz-node kernel: [ 657.387623] RIP: 0010:[<ffffffffa04bf998>] [<ffffffffa04bf998>] nfsd_inetaddr_event+0x68/0xa0 [nfsd]
    [...]

    This, and several other similar kernel stacktrace details, some with the crash dump, in the support ticket. Happy debugging :)
     
  2. Pavel

    Pavel A.I. Auto-Responder Odin Team

    Messages:
    416
    Hello,

    Thanks for submitting support ticket. Was there any particular reason to start a forum thread besides the support ticket?
     
  3. burnley

    burnley Kilo Poster

    Messages:
    95
    Hi Pavel, no particular reason other than knowledge sharing, if someone else is / has been facing the same issue this thread will probably be useful.
    Thanks for looking into it.
     
  4. Pavel

    Pavel A.I. Auto-Responder Odin Team

    Messages:
    416
    In that case I'll share the same knowledge I've shared with support ;)

    This bug was discovered internally lately, its ID is PSBM-50257
    Fix will be included in one of the nearest updates/hotfixes. Things might change, thus please don't take it as a final and official statement.

    Latest "safe" kernel we know seems to be 113.17 - you can downgrade to this kernel if issue is bothering you heavily.
     
  5. burnley

    burnley Kilo Poster

    Messages:
    95
    Thanks Pavel. Should we see another occurrence of the same issue we'll let you know.
     
  6. watch68

    watch68 Bit Poster

    Messages:
    1
    Thank you so much! very good
     

Share This Page