Skip to content

Decide Ceph failure domain

Currently, Ceph storage pools have their "failure domain" set to "host". This means that the replicas of each block will be written to disks in different machines, however those machines can be in the same building and even rack.

The problem is that if we lose a rack or a whole building at once, data will become unavailable.

We can set the failure domain to "rack" or "zone" (=building), however this will limit how much storage we can use, since most of our storage is in RCDC (actually most of it is the hsrn-ceph1 box: 1225.3 TB).

zone hdd ssd
rcdc 1635.8 9.7
370j 101.9 34.9
7e12th 67.3 0
wwh 52.8 0
12wvpl 17.5 9.7
2mt 17.5 9.7
60fifthave 17.5 9.7

Replicating 3 times across zones would only give us 130 TB of usable storage. Using erasure coding would give a little more (2 + 2 -> 146 TB) but far from the total capacity.

Another option would be splitting the RCDC zone into more zones, or keeping the current setting of "host" failure domain.

Edited by Remi Rampin