ZFS and Ceph, what a lovely couple they make!


Stable, secure data storage is probably one of the most important things in today’s data driven world. With the ability to scale fast. Combining two great storage solutions provides you with all those in one. ZFS and Ceph are a couple that cannot easily be beaten!


Why is that? The short explanation is scalability. ZFS is a solution which ‘scales up’ as no other, while Ceph is built to ‘scale out’. The term ‘scaling up’ means to extend the storage pool with additional disks which are fully available for the filesystems that use the pool. This model is generally limited by the amount of disks that can be added to a node. ‘Scaling out’ is a different way of growing the storage capacity; not by adding disks (or bigger disks) to a machine or pool, but by adding storage nodes (a storage server with network, compute and storage capacity) to the existing storage capacity. This model is mostly limited by the bandwidth between the different nodes.


That makes it far more easier to grow your storage infrastructure, because you don’t have to change the current hardware architecture expect for the capacity.


Easily scaling up with ZFS

ZFS is a combined file system and logical volume manager partly developed by Sun Microsystems. The ZFS name stands for nothing; briefly assigned the backronym “Zettabyte File System”, it is no longer considered an initialism. ZFS is very scalable, and includes extensive protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z, native NFSv4 ACLs, and can be very precisely configured.


Unlike most files systems, ZFS combines the features of a file system and a volume manager. This means that as supposed to other file systems, ZFS can create a file system that spans across a series of drives or a pool. Not only that; you can add storage to a pool by adding more drives. ZFS will handle partitioning and formatting.


Easily scaling out with Ceph

Ceph is a storage solution that provides applications with object, block, and file system storage. All in a single unified storage cluster. It is flexible, exceptionally reliable, and easy to manage. Ceph decouples the storage software from the underlying hardware. This enables you to build much larger storage clusters with less effort. You can scale out storage clusters indefinitely, using economical commodity hardware, and you can replace hardware easily when it malfunctions or fails. I explained more about Ceph storage here.


The two combined

With that said, I often see organizations start using open source software defined storage with ZFS. Looking at the growth potential of open source storage; there is no limit to how fast companies’ data size grow; it’s stable, highly redundant, cheap, and fast. Because of this, the open source storage systems are ‘abused’ to the max. Now when this happens, the environment grows and at one point the storage infrastructure is ten times larger than imagined when started.


At some point the size of the data grows so fast that the ZFS storage controller node(s) are at the maximum capacity of what they can handle. At this moment you will need to migrate the data to a new ZFS system. At this point it would be very nice to have a way to scale the storage out (combine more units) instead of only up (grow units bigger). This is where Ceph storage complements ZFS. With Ceph you will never have to carry out data migrations when you grow because you will add new storage servers to grow capacity or to remove older storage servers; CEPH will always redistribute the data to make optimal use of all capacity of the platform (storage, compute and networking). Where ZFS can start with little hardware investment though, CEPH requires more hardware as it doesn’t accept compromising the data consistency by storing all data (at least) 3 times.


That’s why ZFS and CEPH make such a great storage couple, each with their own specific use cases within the organization. For example; ZFS is often used for creating a backup or to build archive data, while Ceph provides the S3 cloud storage and virtual disk storage for virtual machines. In other cases, ZFS is used for file system storage while Ceph provides the block storage infrastructure.


And with both solutions being open source and software defined, as you can imagine we at Fairbanks and 42on love them both equally for their own merits, and even more as a complementary couple. And whoever said you had to choose your favorite from such a lovely couple? That makes me curious however: do you use both solutions or did you pick only one for your storage infrastructure?