Hardware node died, containers migrated but didn't boot

Discussion in 'General Questions' started by RobertOe, Apr 18, 2017.

  1. RobertOe

    RobertOe Bit Poster

    Messages:
    7
    Hello guys

    Last night one of the hardware nodes found out it wanted to reboot by itself (not a good sign). Well, the storage and remaining server did what it should, migrated the containers to the other servers. However, they all ended up as down. I had to start them manually.

    Is there a bug or some settings I have not activated? As I understand it, the system should handle these kind of events by itself (and not wake me up at 3:30 am).

    The hardware nodes where running a bit outdated version, 6.8 (packages have version 6.11.25122.1231244-1)

    TIA
    Robert Oedegaard
     
  2. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello Robert,

    6.8 here is not a Virtuozzo version, but a "Virtuozzo Linux" version, OS underneath the virtualization, rhel-based.
    Since packages are 6.11 you're running Virtuozzo 6 update 11, which is sort of OK (just one major update behind, not so critical).

    VEs might been unable to start due to license limitations - one of the possibilities. Or the memory shortage - another one.

    Detailed reason for "why VEs were relocated, but did not start" can be found in shaman master logs.
    To find out which host is the shaman master - check "shaman stat" output:
    Code:
    
        NODE_IP           STATUS     ROLES                RESOURCES
        172.16.1.1        Active     *,~VM:QEMU,~CT:VZ7   5 CT, 15 VM, 0 Unknown, 0 ISCSI
        172.16.1.2        Active     *,~VM:QEMU,~CT:VZ7   2 CT, 3 VM, 0 Unknown, 0 ISCSI
      M 172.16.1.3        Active     *,~VM:QEMU,~CT:VZ7   30 CT, 29 VM, 0 Unknown, 0 ISCSI
        172.16.1.4        Active     *,~VM:QEMU,~CT:VZ7   2 CT, 0 VM, 0 Unknown, 0 ISCSI
    *  172.16.1.7        Inactive   *,~VM:QEMU,~CT:VZ7   0 CT, 3 VM, 0 Unknown, 0 ISCSI
    
    "M" marked node is the master.

    Shaman logs can be found in /var/log/shaman.log

    Also, there is an important note to the shaman behavior. Sometimes server owner stops containers for a reason, like, it must not be started. Therefore when relocation occurs VEs which were initially stopped are not started back.

    So, if your server rebooted (got a clean shutdown), but then failed somewhere on boot, we might get a situation where relocation occurs and all VEs are considered as initially stopped. It indeed is an unpleasant situation, but it's difficult to write a good algorithm to handle it.
     
  3. RobertOe

    RobertOe Bit Poster

    Messages:
    7
    Hello Pavel

    The reboot probably wasn't clean, but I suspect the license might be the issue. We only have a bit more than we need on each node and when one of the nodes goes down the other nodes gets too many according to the license.

    The "bad" node just rebooted (or, it crashed and never recoverd), so one of the other nodes has it license in Grace now.

    Oh well, hope I'll figure out what is going on with the node and get it fixed fast.

    Thanks a lot for your reply.

    Robert Oedegaard
     
  4. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    It's a bit difficult to discuss such issues over the forum board - there're too many variables you have to check on the server itself :) I'd recommend contacting technical support, they'll help to understand what happened
     

Share This Page