pfcached killed by SIGBUS ?

Discussion in 'General Questions' started by deagan, Apr 3, 2017.

  1. deagan

    deagan Bit Poster

    Messages:
    9
    Hi,

    I've installed Virtuozzo 7 on 2 identical servers, but on both I now have the exact same error ( after some days of running, error not on the same dates, like 20+ days after the first server had this error - also at 12:00:01 ):

    ABRT has detected 1 problem(s). For more info run: abrt-cli list --since ...

    id f1e6ccf85836a9a4ecb511e12fa3d7a41d40fa63
    reason: pfcached killed by SIGBUS
    time: Sat 01 Apr 2017 12:00:01 AM CEST
    cmdline: /sbin/pfcached
    package: pfcache-7.0.20-12.vz7
    uid: 0 (root)
    count: 1
    Directory: /var/spool/abrt/ccpp-2017-04-01-00:00:01-2038

    I don't know if this is normal? Both servers run only 1 container with minimal load.

    Hardware info:
    DL380G9's with 2 CPU's
    160 GB ram
    7x SSD disks from HP running in RAID6

    If anyone has similar problems or know the cause, please let me know.

    Thanks in advance!
     
  2. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello deagan,

    Never heard of anything like that. SIGBUS might indicate problems with memory management in code - e.g. double-free or access to foreign memory.
    Can you please check if ABRT has saved coredump of the process? If you have a core it would help us greatly.
     
  3. deagan

    deagan Bit Poster

    Messages:
    9
    Hi Pavel,

    I'm not familiar with abrt, but in the folder I see a coredump file. I could send you the whole dir, what is the best way to send you the file?

    Folder:

    /var/spool/abrt/ccpp-2017-04-01-00:00:01-2038

    total 3.2M
    drwxr-x--- 2 root abrt 4.0K Apr 3 11:54 .
    drwxr-x--x 3 root abrt 4.0K Apr 1 00:00 ..
    -rw-r----- 1 root abrt 6 Apr 1 00:00 abrt_version
    -rw-r----- 1 root abrt 4 Apr 1 00:00 analyzer
    -rw-r----- 1 root abrt 6 Apr 1 00:00 architecture
    -rw-r----- 1 root abrt 249 Apr 1 00:00 cgroup
    -rw-r----- 1 root abrt 14 Apr 1 00:00 cmdline
    -rw-r----- 1 root abrt 7 Apr 1 00:00 component
    -rw-r----- 1 root abrt 2.7K Apr 1 00:00 core_backtrace
    -rw-r----- 1 root abrt 12M Apr 1 00:00 coredump
    -rw-r----- 1 root abrt 1 Apr 1 00:00 count
    -rw-r----- 1 root abrt 1.8K Apr 1 00:00 dso_list
    -rw-r----- 1 root abrt 218 Apr 1 00:00 environ
    -rw-r----- 1 root abrt 0 Apr 1 00:00 event_log
    -rw-r----- 1 root abrt 18 Apr 1 00:00 executable
    -rw-r----- 1 root abrt 82 Apr 1 00:00 exploitable
    -rw-r----- 1 root abrt 4 Apr 1 00:00 global_pid
    -rw-r----- 1 root abrt 18 Apr 1 00:00 hostname
    -rw-r----- 1 root abrt 25 Apr 1 00:00 kernel
    -rw-r----- 1 root abrt 10 Apr 1 00:00 last_occurrence
    -rw-r----- 1 root abrt 1.3K Apr 1 00:00 limits
    -rw-r----- 1 root abrt 135 Apr 1 00:00 machineid
    -rw-r----- 1 root abrt 9.8K Apr 1 00:00 maps
    -rw-r----- 1 root abrt 581 Apr 1 00:00 open_fds
    -rw-r----- 1 root abrt 259 Apr 1 00:00 os_info
    -rw-r----- 1 root abrt 29 Apr 1 00:00 os_release
    -rw-r----- 1 root abrt 21 Apr 1 00:00 package
    -rw-r----- 1 root abrt 4 Apr 1 00:00 pid
    -rw-r----- 1 root abrt 6 Apr 1 00:00 pkg_arch
    -rw-r----- 1 root abrt 1 Apr 1 00:00 pkg_epoch
    -rw-r----- 1 root abrt 7 Apr 1 00:00 pkg_name
    -rw-r----- 1 root abrt 6 Apr 1 00:00 pkg_release
    -rw-r----- 1 root abrt 6 Apr 1 00:00 pkg_version
    -rw-r----- 1 root abrt 1.2K Apr 1 00:00 proc_pid_status
    -rw-r----- 1 root abrt 1 Apr 1 00:00 pwd
    -rw-r----- 1 root abrt 25 Apr 1 00:00 reason
    -rw-r----- 1 root abrt 4 Apr 1 00:00 runlevel
    -rw-r----- 1 root abrt 10 Apr 1 00:00 time
    -rw-r----- 1 root abrt 4 Apr 1 00:00 type
    -rw-r----- 1 root abrt 1 Apr 1 00:00 uid
    -rw-r----- 1 root abrt 5 Apr 1 00:00 username
    -rw-r----- 1 root abrt 40 Apr 1 00:00 uuid
    -rw-r----- 1 root abrt 22K Apr 1 00:00 var_log_messages
     
  4. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    yes, this information is exactly what we need
    Could you upload it to FTP?
    ftp://fe.virtuozzo.com/1fa8351f961c2c1c9780a5e77ebf0726/
     
  5. deagan

    deagan Bit Poster

    Messages:
    9
    Hi Pavel,

    I've uploaded the file + e-mail.
     
  6. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Great! Thanks a lot!
     
  7. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello deagan,

    We've looked into the dump yesterday, unfortunately it did not reveal the root cause.
    Problem itself is minor and should not affect your host stability - let me know if it indeed causes any observable issues.

    We've received similar reports today, and got 4 more dumps. hope they will shed more light to the problem. Just for the reference - internal bug ID for this is PSBM-63453.
     
  8. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello deagan,

    While we're trying to figure out the bug from the crash dump, I'd like to ask if you have any ideas of what could be the trigger?
    I mean, pfcache started to crash at some point recently, however, it wasn't changed from our side for some time and it was quite stable, there should be some kind of a trigger. Was there any update installation going on at 12:00 ? Any regular tasks at 12:00 when crashes took place?

    Looking forward to your reply!
     
  9. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Also I wanted to ask you if you use readykernel.
    Output from following command would be helpful:
    Code:
    # readykernel info
     
  10. deagan

    deagan Bit Poster

    Messages:
    9
    Hi Pavel,

    ------------------
    # readykernel info
    Patch name: readykernel-patch-20.18-17.0-1.vl7
    Patch module: kpatch_cumulative_17_0_r1
    File: /var/lib/kpatch/3.10.0-327.36.1.vz7.20.18/kpatch-cumulative-17-0-r1.ko
    Version: 17.0

    The following issues are fixed by the ReadyKernel patch:

    v17.0 and newer:

    * CVE-2017-7308
    Integer overflows in packet_set_ring().

    v16.0 and newer:

    * CVE-2017-7184
    Local privilege escalation in XFRM framework.

    v15.0 and newer:

    * CVE-2017-2647
    Null pointer dereference in search_keyring().

    v14.0 and newer:

    * CVE-2017-6074
    dccp: double free in dccp_rcv_state_process.

    * CVE-2016-2053
    Kernel panic and system lockup by triggering BUG_ON() in public_key_verify_signature().

    v13.0 and newer:

    * PSBM-57512
    A privileged user inside a container can cause a host kernel crash in udp_lib_get_port().

    * CVE-2017-6214
    ipv4/tcp: Infinite loop in tcp_splice_read().

    v12.0 and newer:

    * PSBM-59964
    Broken isolation for some of "ip ntable" settings.

    v11.0 and newer:

    * CVE-2016-3070
    Null pointer dereference in trace_writeback_dirty_page().

    * CVE-2016-8645
    A BUG() statement can be hit in net/ipv4/tcp_input.c.

    * CVE-2016-9806
    Potential double free in netlink_dump().

    * PSBM-59983
    iptables: forwarding does not work with '--netfilter full'.

    * PSBM-57499
    NULL pointer dereference in write() -> netlink_sendmsg() -> netlink_unicast().

    * PSBM-57511
    General protection fault in sendmsg() -> netlink_sendmsg() -> netlink_unicast().

    v10.0 and newer:

    * CVE-2017-2583
    kvm: vmx/svm potential privilege escalation inside guest.

    * CVE-2017-2584
    kvm: use after free in complete_emulated_mmio.

    v8.0 and newer:

    * CVE-2015-8539
    Keys: general protection fault in trusted_update().

    * PSBM-57915
    fs/fadvise: a way was needed to deactivate pages after cached reads.

    -------------------------
    # yum update
    No packages marked for update

    crontab on host is empty, in VPS:

    MAILTO=""
    0 0 * * * /usr/local/psa/admin/bin/php -dauto_prepend_file=sdk.php '/usr/local/psa/admin/plib/modules/letsencrypt/scripts/renew.php'

    I had this on 2 exact similar servers ( hardware nodes ), exact same time ( different dates but at 12:00 ) and reports. 1 of them doesn't do anything ( empty, running 1 Plesk VPS with 1 domain ), the other also doesn't have much to do ( 1 Plesk VPS with some domains ).

    I didn't run anything special at that time as far as I know.

    Could you move the files from public FTP? Thanks :)

    If there is any more info I can provide, let me know.
     
    Last edited: Apr 6, 2017
  11. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Thanks for the details!
    We've managed to reproduce the problem with a custom-built pfcache. There seem to be race condition, our test binary just helps to catch it easier.
    Crash takes place when /etc/cron.d/pfcache_cron is executed, it collects statistics. You may disable the task until the fix is available.
    Root cause is not quite clear so far, but we suspect it's in the kernel. Since we have a way to reproduce the issue I believe we'll find the reason soon enough.
     

Share This Page