Cloud Storage + all SSDs

Discussion in 'Installation, Update and Configuration' started by futureweb, Feb 18, 2016.

  1. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hello,

    thinking about building a Cloud Storage installation with 3 Blade Servers + 9 x 2TB SSDs in Storage Blades.
    Is there anything to take care of?
    As the OS is installed on normal HDDs - do we need to link some directories (Logs?!?) to the SSDs for performance?
    Any experiences with this kind of configuration? Tipps/Tricks?

    Thx
    Andreas Schnederle-Wagner
     
  2. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello Andreas,

    It took me a while to gather all out experience on this topic in a pile and sort it out.
    Please do not hesitate to ask questions if something will appear unclear.

    First of all, if your goal is high-performance cluster you should ensure you have a dedicated network solely for pstorage traffic. bonding adapter of 2x1GbE adapters might not be enough for such cluster. I'd recommend at least 10GbE adapters with dedicated switch.

    Next - the drives. 9x2Tb SSD disks sounds nice in terms of performance, however, this approach has its own drawback in maintenance.
    1) SSD must have high endurance rate (to be able to withstand high I\O from a cluster activity and to last long)
    2) SSD must have power-loss protection (to ensure you won't get your data corrupted)

    -- Which means SSD must be enterprise-level, which is going to be costly.

    3) Also, if possible, I'd recommend to buy these SSD disks in different "transactions" to make sure they were not manufactured on a same day on a same factory - this decreases chance of simlutaneous drive failure (I've seen SSD cluster built of relatively cheap disks which got ~20 disks failed in 2 days - they've been bought in bulk and degraded simultaneously).

    4) If someday you decide to add an HDD to the cluster entire cluster performance will get degraded.
    When data is written on a storage its written in 3 CS-es simultaneously. Operation is not over until all 3 replicas are written. Imagine having 2 replicas written on SSD and 1 on HDD - overall I\O operation will drop to the HDD level.
    Which means you'll be able to expand cluster only with SSD's without performance loss.

    5) Definitely set up monitoring for SSD disks states across all the cluster - "pstorage" of course shows failed disks in "pstorage top", however, currently there is no e-mail notification functionality (it will be implemented in one of the updates though - it was already considered for implementation).

    As an alternative to SSD-only cluster you may consider using fast HDD's - 10k or 15k rpm disks. Maybe even SAS, and use smaller (but still enterprise grade with necessary features) SSDs for cache and journal.
    This options might be a bit slower, but I believe would last longer (although I cannot predict how long will 2Tb enterprise grade SSDs will stay alive under significant I\O load).


    And yes, moving logs to SSD for SSD-only cluster probably would be a good idea. You can either symlink or bindmount /var/log/pstorage directory. And of course it would make sense to put pfcache on ssd.
     
  3. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Just wanted to note that all these recommendations are coming from SSD-related drawbacks. It's not like "pstorage is working with SSD in a bad manner", it's the things you should be careful of if you're using SSD's as your only storage.
     
  4. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hello Pavel,

    thank you for your detailed Answer! ;-)

    1+2) We are going to set up this Project with HP Blade Servers + HP Virtual Connect - so we got 2x10GBit on each Server - one 10GBit Network would be used for pstorage - one for normal Traffic - so I guess this should do the Job! ;-)
    1+2) As for the SSDs - Samsung Enterprise SSD SM863 1,92TB will be used - with TBW of 12.320 per Drive - I guess this should be enough for many years of usage. Also they got a tantalum capacitor for power loss protection. (+ it's housed in a DC with redundant power lines, UPS and emergency Power-Generator)
    3) The Tipp with buying different charges of the SSDs sounds very good - will look for that!!
    4) It will only be expanded with SSDs - never HDDs because of the Performance degradation
    5) important information - thx - is SNMP monitoring of pstorage possible? (would like to integrate it into our Icinga Monitoring)

    Setup willl be something like this:
    3x BL460c GEN9 Blade Servers + 3x Storage Blades.
    3 SSDs in every Storage Blade ... passthrough presented to OS (guess I need to set up 3 Failure Domains for that?)

    Should "/var/log/pstorage" / pfcache be mounted onto the pstorage or onto a seperate Parition on the local SSD?

    Andreas
     
  5. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello Andreas,

    5) Currently pstorage does not provide SNMP monitoring, but since we're just interested in SMART counters of the disks I'm 100% sure there are plenty of implementations on the web. E.g. one I found in couple clicks:
    https://www.pitt-pladdy.com/blog/_20091031-144604_0000_SMART_stats_on_Cacti_via_SNMP_/

    Currently failover domains or relocation domains are not implemented in HA. It's possible to modify shaman(our software responsible for HA) scripts manually to achieve this goal (afaik some customers did it), however this feature is not yet available out of the box. For your reference feature's internal ID is PSBM-24733.
    If I understand correctly you wish to pass physical disks to some VMs. In that case relocating VMs to other hosts sounds meaningless.
    Probably it would be more adequate just to disable HA for these VMs and probably even keep them locally, not on a pstorage.

    It should be placed on a local disk.
    Writing log to pstorage would generate more logs to write to the very same pstorage and so on :)
    I think it's a very nice implementation of a "fork-bomb" in pstorage-style :)

    And pfcache must be on a local SSD for performance reasons as well - local SSD would be faster than pstorage, even if it's a SSD-based-pstorage.
    And I doubt anyone ever tested pfcache on pstorage. It's simply not designed to work that way.

    Strictly speaking pstorage should store _only_ virtual machines and containers. It's optimized for a large files (since all files are split into 64M chunks)
    Pstorage might store backups but that would be a bad idea - for safety purposes backups shouldn't be stored in the same place as the production itself - it must be a separate entity.
     
  6. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hi Pavel,

    correct - with Icinga I can monitor every SMART value I like ... just need to check if HP P420 RAID Controller gives SMART Values to Host-OS when Disks are configured as passthrough.

    No passing SSDs to VMs - with passthrough I meant the RAID Controller will present the Disks as single disks to PCS - not as an Raid Array ...
    And I was speaking of Virtuozzo Storage Failure Domains - not shaman. As I present 3 SSDs per Server to the pstorage - they should reside within the same Failure Domain - correct? (So Virtuozzo Storage won't store chunks on different SSDs on the same Server)

    ok - pstorage fork-bomb ... hehe ... guess we don't want to try that one! ;-)
    What size do you suggest to reserve for pfcache / "/var/log/pstorage" on the local SSDs?
     
  7. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Excuse me for this misunderstanding, I get it now.
    Afaik by default failure-domain is set to host, which is correct (will not allow more than 1 replica per host), which means you won't have to change this.
    However, if for some reason it's set to something different you can use following command to set it:

    Code:
    # pstorage -c <cluster_name> set-attr -R -p / failure-domain=host 
    One of the things that some people forget to change is the replication level, change this once all CS-es are set up:
    Code:
    # pstorage -c <cluster_name> set-attr -R -p / replicas=3:2 
    "3:2" is most common recommended level, change to something else if that doesn't suit you.


    It depends on the size of pfcache of course. By default it шs 10 Gb in size.
     
  8. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright - so if I have presented 3 SSDs from 1 Host - it will know they all belong to the same Host and won't save same chunks on those HDDs ... makes sense ... :)
    as for the Replicas - I'm puzzled if it would make sense to set it to 3:1 - so CTs stay online in the very unlikely case 2 Nodes fail at the same time ... it it's 3:2 the CTs will fail (blocked write) - or can there be really bad consequences if setting it to 3:1?

    One more Question for the local SSD Storage:
    - /var/log/pstorage <-- should be on local SSD
    - pfcache <-- should be on local storage
    - What about CS WRITE JOURNAL?
    - What about MDS?
    - any other things should be on local SSD for best performance?

    Just need to know what I need to store on local SSD for best performance - and how many Space I should allocate for this ... (When using 9x2TB SSDs with 3 replicas = ~5.xTB useable)

    Thx
     
  9. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Hello Andreas,

    We'll discuss replicas in the separate thread you've created. I'm thinking on this question since it's tricky and I want to deliver best possible answer. Please wait a bit.

    As for the storage for services:
    - /var/log/pstorage/ - not necessarily on SSD, you might as well keep it on a local disk. On a second though there is no significant profit from storing them on SSD.
    - pfcache - either SSD or do not use it at all(for SSD-based pstorage cluster placing pfcache on a local disk might be a drawback)
    - pstorage-mount cache and cs journal must be on SSD if you use them. However, on an ssd-based storage they might be unnecessary.
    cache and journal should re discussed separately since they work in a different manner.
    Journals work with CS-es - data is written first to journal and dropped to the disk improving write-to-CS speed. If CS-es are SSDs that is not necessary.
    Mount Cache works with pstorage-mount and keeps most-accessed data (from this mount point) on SSD to improve read speed. If all CS-es are SSDs that wouldn't change much - read will be fast anyway.
    - If you want to improve performance I'd recommend placing MDS on SSD. And which is important - not to share same drive with CS. Otherwise high load on CS might slowdown MDS.

    That's pretty much everything, I cannot imagine anything else related to SSDs-in-pstorage-cluster.
     
  10. futureweb

    futureweb Tera Poster

    Messages:
    397
    Hello Pavel,

    pstorage-mount cache and cs journal exist only if i manually create them?! So if I don't create them - and don't need them in an all SSD Environment - nothing to care here - right?

    alright - how big should local SSD parition be for the MDS?

    Thx
    Andreas
     
  11. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Correct.

    MDS doesn't take much space.
    Check more on requirements in the documentation:
    http://updates.virtuozzo.com/doc/pc...allels_Cloud_Storage_Administrators_Guide.pdf page 12.

    Note, when planning SSD partitioning - make sure to reserve at least 10% of space. When SSD is filled more than 90% its performance will significantly fall -thus it's recommended to keep it < 90% filled.
     
  12. futureweb

    futureweb Tera Poster

    Messages:
    397
    One additional Question regarding pfcache - I just read that pfcache is also responsible for IOPS DEDUP --> The IOPS deduplication mechanism maintains the cache of frequently used files and puts those files to Parallels File Cache (pfcache) which resides in server memory.
    Is IOPS DEDUP still available if I don't use pfcache as you suggest on SSD only pstorage? (in server memory?!)
     
  13. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    It's not like I suggested not to use it entirely. I've explicitly mentioned that if you gave SSD-only pstorage, then pfcache should also be on SSD. If you place it on HDD it might become a drawback in your setup.
    If you won't use pfcache - IOPS and memory deduplication will not happen.

    You might want to read more about the way pfcache works over here:
    http://download.parallels.com/doc/pcs/html/Parallels_Cloud_Server_Users_Guide/35083.htm
     
  14. futureweb

    futureweb Tera Poster

    Messages:
    397
    ok - guess I misunderstood it in the first place - now I got it (mixed it up with pstorage-mount cache / cs journal)! So I will mount it to a dedicated SSD ...
    As for the size of the pfcache partition - do you have an advice for me how large I should make it?
     
  15. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    There is no necessity to make a separate partition for pfcache. Default size of pfcache drive (pfcache represents a ploop drive) is 10Gb. Be prepared to the fact it will consume up to 10 Gb.
     
  16. futureweb

    futureweb Tera Poster

    Messages:
    397
    Since / (and so also /vz/) are on HDDs I need to mount /vz/pfcache/ from SSD I guess? (Server Blades got 15k SAS HDDs, Storage Blades got SSDs)
    That's why I'm speaking about own partition :)

    10Gb ... ok ... that's not big :)
     
  17. futureweb

    futureweb Tera Poster

    Messages:
    397
    alright - all set up now! ;-)

    pcs1-mds and pfcache.hdd are on local SSD and no pstorage-mount-cache / cs journal configured.
    Guess it won't make sense to increase maximum size of pfcache larger than 10GB? (As there is lot's of free Space on the local RAID 1 SSD ... ;-))
     
  18. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    Sometimes pfcache may be short on space - although it has mechanism of cleaning up less-often touched data in such cases. If you notice that pfcache disk is full you might increase the size
     
  19. futureweb

    futureweb Tera Poster

    Messages:
    397
    according to vzstat we are already short of space regarding the pfcache ... ;-)

    How do I grow pfcache? Anything to take care? Any Service restarts needed?
     
  20. Pavel

    Pavel A.I. Auto-Responder Staff Member

    Messages:
    478
    You can use ploop utility to resize pfcache - it's a simple ploop after all:
    Code:
    # ploop resize -s 20G /vz/pfcache.hdd/DiskDescriptor.xml
     

Share This Page