PStorage - client serviced more than 8000 ms

Discussion in 'Installation, Update and Configuration' started by futureweb, May 10, 2016.

  1. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hello,
    getting some Errors on our PStorage - not much ... should I care about them or is it "normal" that they can happen?

    Thx
    Andreas Schnederle-Wagner

     
  2. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello Andreas

    These messages indicate some slowdown on pstorage side. According to messages posted pstorage-mount is slowing things down - disks and network is fine.
    Basically it means that some I\O request took longer than 8 seconds to complete. According to your logs - the delay is always in pstorage-mount.

    I've been hearing about this problem more and more often recently, I believe it's not a coincidence, and needs to be investigated properly.
    Please contact support - this smells like a bug we need to investigate.
     
  3. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright - opened Ticket #2054003 for this Case!
     
  4. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Ok, thanks.
    I've seen recommendation - it does make sense. It might not solve the problem entirely but should make things better.
    Please note, cpuunits configuration is not reflecting some actual limit - it shows container's weight, cpu time share container gets. Also, node's processes have their own share which is defined in /etc/vz/vz.conf as VE0CPUUNITS= value.

    Having node CPUUNITS at 1000 and containers' CPUUNITS at 10k+ or even 100k causes CPU scheduling suffocation for node's processes. Since pstorage-mount is one of them you should consider increasing cpuunits for the node, or decreasing all the cpuunits for containers to make them more reasonable. I.e. decreasing all cpuunits of containers by 100 times you'll keep the same distribution of the cpu time share, however, will grant a bit more time for the ve0 processes.

    Besides, to avoid scheduling issues for containers with a small share it's recommended to keep min and max CPUUNITS among the containers not too far away. Like, max shouldn't be larger than min*10.
    Take a look at "vzcpucheck -v" or "vzlist -aHo ctid,cpuunits" and you'll see that your environment is optimal in terms of cpuunits.
     
  5. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright - I configured it now that Root Node got 500.000 CPUUNITS and CTs get 50.000 CPUUNITS - will then adjust Containers based on how important they are ... but not that wide spread as before ... ;-)


    Unfortunately it did not solve the "8000 ms" Problem ... got another Warning a few Minutes ago! :-/

    11-05-16 14:02:10 CLN WRN IO requests on client #2224 serviced more than 8000 ms

    Also updated Ticket

    ps) What about CTID 1 - it got only 1000 Units per default - should I also set it to 50k?
     
  6. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    It might not resolve the issue entirely, but at least improve situation. Can you assess the volume of messages before and after?
    Also, you can renice pstorage-mount process to give it a higher priority.
    E.g.
    Code:
    # renice -n -19 -p $(pgrep pstorage-mount)
    Or, you can use another nice value of your choice.
     
  7. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright - will try this one. But Root Nodes are NOT short on CPU ... so this should normally be no problem?!?
    When restarting a Node participating within PStorage - it's normal to have lot's of those Errors?!?

    Shouldn't the Client request the Data from another CS sooner?
     
  8. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    You're missing the point. Problem is not in the actual load in CPU, but the scheduling latency. Supposedly.
    Restarting a node shouldn't trigger such messages. These are not errors - just a warnings for a long I\O.

    That's not how it works. "Write" operation is not considered complete until data is successfully written to all chunks involved in I\O.
     
  9. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright ... now I got it! ;-)
    Unfortunately I'm not really experienced with CPU Scheduling ... if that's the "problem" - what can we do?! :)
     
  10. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Dear Andreas,

    It's a bad idea to discuss this in parallel with support and on a forum with me simultaneously. That throws in a lot of confusion.
    I'll let support do their job ;)
     
  11. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright - let's see what they come up with! ;-)
     
  12. SteveITS

    SteveITS Tera Poster

    Messages:
    271
    I posted in a different thread recently...after we upgraded our RAID controller firmware and Intel SSD drive firmware these incidents went way down and almost disappeared. I suspect the Intel SSD firmware since even though we're using the recommended model we had some really wonky behavior on a Windows server made late last year until we updated those drives' firmware.

    Intel has updated the drive firmware multiple times in the past six months.
     
  13. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hi Steve,
    thx for your input. We are using Samsung SM863 SSDs. Will check if there is a new Firmware.
     
  14. SteveITS

    SteveITS Tera Poster

    Messages:
    271
  15. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Steve, that's quite interesting. Although that would only help if you have the "overflow" messages, I assume
     
  16. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hi Steve,
    thx for pointing that our - but "unfortunately" no overflow messages on our Servers ... just lot's of "client serviced more than 8000 ms" Errors we can't get rid of ... and that on a SSD only Storage Sub-System ... ;-)


     
  17. SteveITS

    SteveITS Tera Poster

    Messages:
    271
    Pavel, I agree that logically it should only affect the neighbour overflow message. I forgot to mention that I did go back and look, and most of our recent "8000 ms" instances did not get the "neighbour" message. However the only other change was a week earlier (the 11th) when we installed update 11 hotfix 11, and we got both messages after that (the 16th). However we have not had the "8000 ms" warning since. If I do see one I'll post back but I think this our longest period by far.

    Edit: the behavior for the "neighbour" issue was different also...the CS was up and down over (from memory) 10 minutes or thereabouts. Usually we get just 1-3 "8000 ms" event entries.
     
    Last edited: Sep 1, 2016
  18. SteveITS

    SteveITS Tera Poster

    Messages:
    271
    Best Answer
    Following up on this thread, we upgraded our cluster storage servers to Virtozzo 7 over the past few weeks. The "serviced more than 8000 ms" messages started cropping up again and became daily by the time we were done upgrading (2-10 times or so per day, in bunches). Remembering this thread, I found that VZ 7 has these settings in a file /etc/sysctl.d/99-vzctl.conf which contains, among other lines:

    net.ipv4.neigh.default.gc_thresh2=2048
    net.ipv4.neigh.default.gc_thresh3=4096
    net.ipv6.neigh.default.gc_thresh2=2048
    net.ipv6.neigh.default.gc_thresh3=4096

    Notably the default (thresh1) is left at 128.

    I created an alphabetically-later-named (so it loads last) file /etc/sysctl.d/z-filename.conf with the lines we had used in Virtuozzo 6:
    net.ipv4.neigh.default.gc_thresh3 = 8192
    net.ipv4.neigh.default.gc_thresh2 = 4096
    net.ipv4.neigh.default.gc_thresh1 = 1024
    net.ipv6.neigh.default.gc_thresh3 = 8192
    net.ipv6.neigh.default.gc_thresh2 = 4096
    net.ipv6.neigh.default.gc_thresh1 = 2048

    After running "sysctl -p /etc/sysctl.d/z-filename.conf" to import the settings the message hasn't been logged since (4 days).
     
  19. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Great catch Steve, and thanks for sharing this!
     
  20. SteveITS

    SteveITS Tera Poster

    Messages:
    271
    Sure. We didn't have any "8000 ms" errors for several days in a row there but the last few days we've had generally one per day, all but one around 4:12 am - 4:16 am. That seems oddly specific but I'm not sure it can be tied to overnight backups or something like that, from looking at when those run.
     

Share This Page