Skip to end of metadata
Go to start of metadata

Step-by-step guide

  1. Full state report example

    [root@overcloud-controller-1 heat-admin]# ceph health detail
    HEALTH_ERR 1 full osd(s); 2 near full osd(s); too many PGs per OSD (556 > max 300); full flag(s) set
    osd.6 is full at 95%
    osd.3 is near full at 86%
    osd.8 is near full at 86%
    too many PGs per OSD (556 > max 300)
     
    full flag(s) set
  2. First you can try to reduce the weight of the full OSD

    [root@overcloud-controller-1 heat-admin]# ceph osd reweight 6 0.95
    reweighted osd.6 to 0.95 (f333)
  3. Note that more OSDs may enter in near full state and even if the full state flag is cleared might not reach the desired state

    [root@overcloud-controller-1 heat-admin]# ceph health detail
    HEALTH_WARN 4 pgs backfill_wait; 2 pgs backfilling; 6 pgs stuck unclean; recovery 18233/2728843 objects misplaced (0.668%); 4 near full osd(s); too many PGs per OSD (556 > max 300)
    pg 7.bc is stuck unclean for 8526.609344, current state active+remapped+backfilling, last acting [3,7,6]
    pg 6.ab is stuck unclean for 19932.940385, current state active+remapped+wait_backfill, last acting [8,5,6]
    pg 6.9a is stuck unclean for 20638.188059, current state active+remapped+wait_backfill, last acting [3,2,6]
    pg 7.12 is stuck unclean for 17999.803348, current state active+remapped+backfilling, last acting [4,2,6]
    pg 6.85 is stuck unclean for 20918.188965, current state active+remapped+wait_backfill, last acting [6,2,8]
    pg 7.85 is stuck unclean for 20746.597906, current state active+remapped+wait_backfill, last acting [7,6,4]
    pg 7.85 is active+remapped+wait_backfill, acting [7,6,4]
    pg 6.85 is active+remapped+wait_backfill, acting [6,2,8]
    pg 7.12 is active+remapped+backfilling, acting [4,2,6]
    pg 6.9a is active+remapped+wait_backfill, acting [3,2,6]
    pg 6.ab is active+remapped+wait_backfill, acting [8,5,6]
    pg 7.bc is active+remapped+backfilling, acting [3,7,6]
    recovery 18233/2728843 objects misplaced (0.668%)
    osd.1 is near full at 85%
    osd.3 is near full at 86%
    osd.6 is near full at 91%
    osd.8 is near full at 86%
    too many PGs per OSD (556 > max 300)
  4. Check the replicated size

    [root@overcloud-controller-1 heat-admin]# ceph osd dump | grep 'replicated size'
    pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
    pool 1 'backups' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 33 flags hashpspool stripe_width 0
    pool 2 'images' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 230 flags hashpspool stripe_width 0
    pool 3 'manila_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 35 flags hashpspool stripe_width 0
    pool 4 'manila_metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 36 flags hashpspool stripe_width 0
    pool 5 'metrics' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 37 flags hashpspool stripe_width 0
    pool 6 'vms' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 229 flags hashpspool stripe_width 0
    pool 7 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 140 flags hashpspool stripe_width 
  5. Next step is to reduce the number of replicas on the "volumes" pool from 3 to 2 (the most utilized volume).

    [root@overcloud-controller-1 heat-admin]# ceph osd pool set volumes size 2
    set pool 7 size to 2
    [root@overcloud-controller-1 heat-admin]# ceph osd dump | grep 'replicated size'
    pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
    pool 1 'backups' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 33 flags hashpspool stripe_width 0
    pool 2 'images' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 230 flags hashpspool stripe_width 0
    pool 3 'manila_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 35 flags hashpspool stripe_width 0
    pool 4 'manila_metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 36 flags hashpspool stripe_width 0
    pool 5 'metrics' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 37 flags hashpspool stripe_width 0
    pool 6 'vms' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 229 flags hashpspool stripe_width 0
    pool 7 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 254 flags hashpspool stripe_width 0
  6. The volumes pool has reduced from 92% to 55%, sufficient to buy time to restore the service and take cleanup actions tighter with adding extra OSD option.

    [root@overcloud-controller-1 heat-admin]# ceph df
    GLOBAL:
        SIZE       AVAIL     RAW USED     %RAW USED 
        11172G     3990G        7182G         64.29 
    POOLS:
        NAME                ID     USED      %USED     MAX AVAIL     OBJECTS 
        rbd                 0          0         0         1200G           0 
        backups             1          0         0         1200G           0 
        images              2       380G     24.06         1200G      113193 
        manila_data         3          0         0         1200G           0 
        manila_metadata     4          0         0         1200G           0 
        metrics             5       599M      0.05         1200G      142613 
        vms                 6       500G     29.42         1200G      129025 
        volumes             7      2288G     55.95         1801G      588571

There is no content with the specified labels



  • No labels