13 November 2008 - 21:35v0.5 release sent to linux-fsdevel, -kernel

I’ve tagged a v0.5 release, and this time sent the client portion in patch form to linux-kernel and linux-fsdevel for review.  We’ll see what happens!  It weights in a 20k lines of code, so I’ll be impressed if anyone decides to wade through it immediately.

New in this release:

  • Lots of bug fixes, especially in the object storage area.  We’ve been doing lots of recovery testing (big thanks again to Brent Nelson at UFL for his help testing) and things have improved dramatically.  Snapshots appear to be pretty solid at this point as well.
  • The OSD storage now accepts “compound” operations that make multiple updates to an object in one go.  For now, this is just used by the MDS to set some additional attributes on directory objects that can be used by fsck-type tools.  The larger goal is for this to support higher order object mutations for a sort of lightweight “active storage” system.  (The systems research group at UCSC has been looking at this recently.)
  • The btrfs storage layer continues to evolve.  Updated ioctl patches have been submitted to btrfs (these hit btrfs-unstable yesterday).
  • You can now forcibly unmount a ceph mount (if, say, the servers go permanently offline).
  • OSDs shut down nicely when sent SIGTERM or SIGINT.
  • OSD recovery is managed by a separate thread and (naively) throttled.
  • Too many small improvements and fixes to count.

Items on the todo list for the next release include:

  • ENOSPC handling.
  • Async metadata operations.  Currently all metadata updates are synchronously journaled, making a lot of operations (like untar) quite slow.  When a client is the exclusive user of a directory, we should perform these operations asynchronously, and only block on an fsync on the containing directory.  The existing file capability and internal MDS locking infrastructure should make this pretty straightforward, and the performance win will be pretty huge.
  • Fully integrated Content addressible storage (CAS) is still on the list, and most of the groundwork has already been laid.  I hope to get to it soon, although it’s certainly not at the top of the list yet.

posted by sage | 1 Comment | Tags: Releases

6 November 2008 - 10:18lockdep for pthreads

Linux has a great tool called lockdep for identifying locking dependency problems.  Instead of waiting until an actual deadlock occurs (which may be extremely difficult when it is a timing-sensitive thing), lockdep keeps track of which locks are already held when any new lock is taken, and ensures that there are no cycles in the dependency graph.

The other day I was sifting through gdb backtraces decoding a deadlock bug in the OSD daemon when it occured to me that it would be nice to have a similar tool for user space applications using pthreads.  A quick search didn’t turn up anything promising, so I put together a simple dependency checker and hooked it into Ceph’s existing Mutex and RWLock wrappers. It was surprisingly quick to put together, and it works!  I was a little disappointed to only find two real dependency bugs.  But the project also motivated me to disable recursive locking (since my lockdep doesn’t cope with that), and that turned up a half dozen other instances of lazyness.

My lockdep code (C++) is here and here, plus the hooks into the mutex wrapper.

posted by sage | No Comments | Tags: Dev notes