30 August 2010 - 11:44rbd (rados block device) status

The rados block device (rbd) is looking pretty good at this point.  The basic feature set:

  • network block device backed by objects in the Ceph distributed object store (rados)
  • thinly provisioned
  • image resizing
  • image export/import/copy/rename
  • read-only snapshots
  • revert to snapshot
  • Linux and qemu/kvm clients

Main items on the the to-do list:

  • TRIM
  • image layering/copy-on-write
  • locking

The server side components are in place in both the v0.21 releases and the unstable branch.  On the client side, there are two options.

First, qemu/kvm can be patched to map an rbd image as a block device.  The code is available in git from

Alternatively, Wido has built some patched Ubuntu 10.4 packages for both qemu and libvirt, available from

  • deb http://pcx.apt-get.eu/ubuntu lucid unofficial

The qemu/kvm patches will likely be included in the next major qemu release.

The native Linux kernel rbd kernel driver is also quite stable, but did not make it upstream for the 2.6.36 release cycle.  We hope to have it in 2.6.37.  The code for that is available at

The main hold up there is that the addition of rbd involves refactoring a lot of the common Ceph file system client code into a libceph module that is shared by both rbd and the file system client.  This makes rebasing more difficult, so that branch may not have the most recent fixes in the master branch or the current -rc kernels.  Also, the code reorganization completely breaks my semi-automated ceph-client-standalone.git updates, so for now you can’t clone and build it as a standalone module.

For more information, see the rbd and kvm-rbd wiki pages.

posted by sage | 1 Comment | Tags: RADOS

28 August 2010 - 8:49v0.21.2 released

This is a second bugfix release for the v0.21 series.  Changes include:

  • osd: less log noise
  • osd: mark down old heartbeat peers
  • filejournal: clean up init sequence, less confusing errors on startup
  • msgr: fix throttler leak (fixes deadlock)
  • osdmaptool: don’t crash on corrupt input
  • mds: error to client on invalid opcode
  • mds: fix ENOTEMPTY checks on rmdir
  • osd: fix race between reads and cloned objects
  • auth: fix keyring search path when $HOME not defined
  • client: fix xattr writeback
  • client: fix snap vs metadata writeback
  • osd: fix journal, btrfs throttling
  • msgr: fix memory leak on closed connections

Relevant URLs:

posted by sage | No Comments | Tags: Uncategorized

16 August 2010 - 7:02v0.21.1 released

We’ve made a bugfix release for v0.21 last week.  Changes include:

  • debian and rpm packaging fixes
  • mds: fixed crash on some mds->client messages
  • mds: fix snaprealm behavior on readdir (occasional client misbehavior)
  • monmaptool: man page typo
  • rados: usage fix
  • osd: fix heartbeat to/from osds (fixing osd up/down flapping)
  • osd: fix replies to dup/committed requests (fixes client hangs)
  • librados: .hpp fix
  • cclass: fix .so loading
  • cauthtool: fix man page example for fs clients
  • fix log rotation

Relevant URLs:

posted by sage | No Comments | Tags: Uncategorized

29 July 2010 - 12:00v0.21 released

It’s been a while, but v0.21 is ready.  Most of the work this time around has been on stability. There is one key new feature, however: RBD, the rados block device, which let you create a virtual disk backed by objects stored in the Ceph cluster.  The images can be mapped natively by the ceph kernel module or via a driver in qemu/KVM.  Although neither of those drivers is upstream yet, the server side functionality and admin tools are in place.

Changes since v0.20 include:

  • improved logging infrastructure
  • log rotate
  • mkfs improvements
  • rbd tool, and rados class
  • mds: return ENOTEMPTY when removing directory with snapshots
  • mds: lazy io support (experimental)
  • msgr: send messages directory to connection handles (more efficient)
  • faster atomic_t via libatomic-ops
  • mon: recovery improvements, fixes (e.g. when one mon is down for a long time)
  • mon: warn on monitor clock drift
  • osd: large object support
  • osd: heartbeat improvements, fixes
  • osd: journaling fixes, improvements (bugs, better use of direct io)
  • osd: snapshot rollback op (for rbd)
  • radosgw fixes, improvements
  • many memory leaks and other bugs fixed

The project roadmap has been updated and is available via the issue tracker.

Relevant URLs:

posted by sage | 2 Comments | Tags: Releases

27 May 2010 - 10:04v0.20.2 released

We’ve released v0.20.2 with a few bug fixes.  These include

  • initscript: drop incorrect default btrfs mount option
  • initscript: behave on ksh (ubuntu)
  • monc: monitor hunting fixes
  • osd: mkfs more robust
  • cfuse: fix mount error handing
  • ppc64: fix build problems on fedora
  • mds: misc clustering fixes
  • osd: fix recovery bug

To get it:

posted by sage | No Comments | Tags: Releases

17 May 2010 - 9:57Linux v2.6.34 is out

Linux v2.6.34, which includes the Ceph kernel client, has been released!  This is an exciting milestone for us, and we’re pretty happy with the stability of the client code that made it into this release.  This should make it easier for people to experiment with Ceph and see how it holds up on a wide variety of systems.

Please note, however, that Ceph is still experimental and is not yet ready for use in a production environment.  We have made every effort to prevent the client from crashing your system, but it is still relatively young code.  The server side also has some known issues, and will need both time and testing to earn our trust.

posted by sage | 4 Comments | Tags: Releases

17 May 2010 - 9:40v0.20.1 released

We’ve released a stable update with a bunch of bug fixes since v0.20.  Notably, we’ve fixed

  • mkfs problems with osd journal file
  • librados aio api issues
  • misc osd fixes (crashes, hangs)
  • inconsistent readdir results across nodes

and lots of other small stuff.  To get it:

posted by sage | 2 Comments | Tags: Releases

30 April 2010 - 15:32v0.20 released

After a long few weeks of debugging, we’re releasing v0.20.  The goal here is to get something out prior to the v2.6.34 kernel release (which includes the Ceph client) with most of the pending improvements.  Changes since v0.19 include:

  • osd: new filestore, journaling infrastructure.  (lower latency writes, btrfs no longer strictly required)
  • msgr: wire protocol improvements
  • mds: reduced memory utilization (still more to do!)
  • auth: many auth_x cleanups and improvements
  • librados: some cleanup; C++ API now usable
  • many bug fixes throughout

There are a handful of bugs that we’ve seen but haven’t been able to reproduce reliably.  As those are fixed there will be a v0.20.1 point release.  In the meantime, work continues on v0.21.  Upcoming changes include:

  • performance improvements
  • rbd: rados block device (kvm and native linux drivers)
  • flock/fnctl lock support
  • lazy io
  • allow client reconnect even after mds has restarted (useful for clients temporarily disconnected during mds restarts)
  • cluster mds fixes

To get it:

RPMs will be included in the soon to be released Fedora 13.  There is also a ceph.spec file in git to build your own.

posted by sage | 4 Comments | Tags: Releases

19 March 2010 - 14:03RBD: rados block driver

Christian Brunner sent an initial implementation of ‘rbd’, a librados-based block driver for qemu/KVM, to the ceph-devel list last week.   A few minor nits aside, it looks pretty good and works well.  The basic idea is to stripe a VM block device over (by default) 4MB objects stored in the Ceph distributed object store.  This gives you shared block storage to facilitate VM migration between hosts and fancy things like that.  The implementation is super simple: it’s just a few hundred lines wiring the qemu storage abstraction up to librados. (This is very similar to what the Sheepdog folks are doing.)

We’re currently hacking together a proper rbd Linux block device for the kernel, as well, based on the osdblk device (which turns a SCSI T10 OSD object into a block device).  The goal is to make the two compatible.  At this stage you can create an rbd block device, format (mke2fs) and mount it, and it seems to work.

Both drivers will eventually get snapshot support.

Stay tuned!

posted by sage | 2 Comments | Tags: Updates

19 March 2010 - 13:53Client merged for 2.6.34

Linus merged the Ceph for 2.6.34 this morning, which means the next kernel release will be able to mount a Ceph file system without any additional patches or modifications.

This is a pretty big milestone for us, and we’re excited! The next few weeks will be spent hammering out client bugs and polishing the v0.20 release.

posted by sage | 3 Comments | Tags: Updates