4 December 2009 - 10:52v0.18 released

There’s a v0.18 release to match the latest posting of the kernel client code on the Linux email lists.  If there are no final issues there, that will be what I send to Linus for 2.6.33.

Most of the changes since v0.17 are bug fixes in the MDS and kclient.  The main other item is an authentication framework to restrict access to the cluster and it’s services to authorized clients.  Two protocols/schemes are implemented: an AUTH_NONE framework that does no real authentication (and is essentially equivalent to what we’ve had until now) and a AUTH_CEPHX scheme that uses  Kerberos-like tickets to mutually authenticate clients and services.

Changes since v0.17 include:

  • osd: basic ENOSPC handling
  • big endian fixes
  • osd: improved object -> pg hash function; selectable
  • crush: selectable hash functions
  • mds restart bug fixes
  • kclient: mds reconnect bug fixes
  • fixed mds log trimming bug
  • fixed mds cap vs snap deadlock
  • filestore: faster flushing
  • uclient,kclient: snapshot fixes
  • mds: fixed recursive accounting bug
  • uclient: fixes for 32bit clients
  • auth: ‘none’ security framework
  • mon: safely bail on write errors (e.g. ENOSPC)
  • mds: fix replay/reconnect race (causing a fast client reconnect to fail)
  • mds: misc journal replay, session fixes

There is a known memory leak in the MDS in this release.  It should be fixed in the unstable git shortly.

Looking forward, the main items are:

  • stability
  • fixing a few pressing MDS performance issues
  • improving OSD interaction with btrfs (we may switch to using btrfs snapshots in place of the user transaction ioctls)
  • stability

Relevant URLs:

posted by sage | No Comments | Tags: Releases

19 October 2009 - 15:12v0.17 released

We’ve released v0.17.  This is mainly bug fixes and some monitor improvements.  Changes since v0.16 include:

  • kclient: fix >1 mds mdsmap decoding
  • kclient: fix mon subscription renewal
  • osdmap: fix encoding bug (and resulting kclient crash)
  • msgr: simplified policy, failure model, code
  • mon: less push, more pull
  • mon: clients maintain single monitor session, requests and replies are routed by the cluster
  • mon cluster expansion works (see Monitor cluster expansion)
  • osd: fix pgid parsing bug (broke restarts on clusters with 10 osds or more)

The other change with this release is that the kernel code is no longer bundled with the server code; it lives in a separate git tree.

posted by sage | No Comments | Tags: Releases

16 October 2009 - 12:55Kernel client git trees have moved

The kernel client git trees have moved to kernel.org.  The main line of development is in a kernel tree that contains the Ceph client:

 git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git

Generally speaking, the master branch will contain stable code that is ready to be pushed upstream, while the unstable branch has the bleeding edge (and may be rebased).

There is also a git tree containing just the Ceph module source.  It mirrors commits from the main tree (for fs/ceph/* only), so there is a useful history, and it also contains ‘backport’ branches that will build on older kernels.

git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git

The userspace server side code (ceph.git) hasn’t moved; it’s still at

git://ceph.newdream.net/ceph.git

Enjoy!

posted by sage | No Comments | Tags: Uncategorized

5 October 2009 - 15:12v0.16 released

We’ve released v0.16.  The release primarily incorporates feedback on the Linux kernel client from LKML.  Changes since v0.15 include:

  • kclient: corrected inline abuse, use of __init, sockaddr_storage (IPv6 groundwork), and other feedback
  • kclient: xattr cleanups
  • kclient: fix invalidate lockup bug
  • kclient: fix msgr queue accounting lockup bug

Andrew Morton was nice enough to take some time to look at v0.15 and, “unless others emit convincing squeaks,” suggested I ask Stephen to include it in linux-next and send Linus a pull request for 2.6.33.  Yay!  With luck this will be the last version spammed to LKML in its entirety.

Meanwhile, Yehuda is continuing work on the security infrastructure to provide mutual trust between monitors, MDSs, OSDs, and clients, and Greg is working some odds and ends (monitor cluster expansion, libceph/fuse/Hadoop client improvements).

Here are the relevant URLs:

P.S. I’d like to start building up to date RPMs as well.  If anyone wants to help get ceph.spec in sync with the debian packages, that would be great.

posted by sage | No Comments | Tags: Releases

22 September 2009 - 10:08Ceph talk at LCA2010

I’ll be giving a talk on Ceph at linux.conf.au 2010!  (Oddly enough, it’s in New Zealand this year, but I’m not complaining.)  I’ve heard great things about LCA, and am looking forward to being there.

The talk will cover two general areas: Ceph’s RADOS object storage architecture, including some of its data processing features, and the distributed file system that’s built on top of it.  The goal is to make it useful for administrators interested in a scalable file system, and developers working on cloud computing applications in needof a scalable storage and computing platform.

posted by sage | No Comments | Tags: Updates

22 September 2009 - 9:49v0.15 released

We’ve released v0.15.  This is mostly cleanups for the kernel client and some work on the monitor interface.  Changes since v0.14 include:

  • kclient: message api fixups (simpler, more robust)
  • kclient: more message pools (avoiding ENOMEM)
  • kclient: new ioctl to extract object name and location/address, given a file handle and offset
  • kclient: fix with osd restart handling
  • msgr: internal interface improvements (session tracking)
  • monitor: interface/protocol cleanup, better session tracking
  • monclient: lots of fixes, improvement
  • debian: fixed permissions on headres in -dev packages; new radosgw package (S3 compatible REST interface to object store)

So nothing too groundbreaking feature wise, mostly just bug fixes and internal code cleanups.  And the radosgw package, which lets you point existing applications using the S3 storage service at a Ceph object store.

Here are the relevant URLs:

posted by sage | No Comments | Tags: Releases

8 September 2009 - 14:39v0.14 released

We’ve released v0.14.  Changes since v0.13 include:

  • Messenger library changes (client now initiates all tcp connections)
  • Improved client/monitor protocol
  • Working Hadoop and Hypertable file system modules (many associated libceph, uclient fixes)
  • man page fixes
  • Debian packages fixed (now libcrush, libcrush-dev, librados, librados-dev, libceph, libceph-dev all work)
  • Streamlined client startup (fewer messages, faster client id assignment)

The messaging changes are the big item here.  They greatly simplify the implementation for the kernel client.  The monitor interface is also improved: clients maintain an open session and ’subscribe’ to map updates they want (generally, all MDS maps, and the next OSD map only when I/O stalls).  This also simplifies things on the monitor, and interestingly brings the monitor design somewhat closer to Zookeeper and CLD.

We’re currenting working on the security infrastructure (mutual authentication of clients, MDSs, OSDs, monitors), the Hadoop and Hypertable file system modules, and getting the kernel client in shape for a merge upstream.

Here are the relevant URLs:

posted by sage | No Comments | Tags: Releases

24 August 2009 - 9:55v0.13 released

We’ve made a v0.13 release.  This mostly fixes bugs with v0.12 that have come up over the past couple weeks:

  • [ku]lcient: fix sync read vs eof, lseek(…, SEEK_END)
  • mds: misc bug fixes for multiclient file access

But also a few other big things:

  • osd: stay active during backlog generation
  • osdmap: override mappings (pg_temp)
  • kclient: some improvements in kmalloc, memory preallocation

The OSD changes mean that the storage cluster can temporarily delegate authority for a placement group to the node that has the complete data while an index is being generated for recovery (that can take a while). Once that’s ready, control will fall back to the new/correct node and the usual recovery will kick in.

The disk format and wire protocols have changed with this version.

We’re continuing to work on the security infrastructure… hopefully will be ready for v0.14.

Here are the relevant URLs:

posted by sage | No Comments | Tags: Releases

5 August 2009 - 14:39v0.12 released

I’ve just tagged a v0.12 released, and sent the kernel client patchset off to the Linux kernel and fsdevel lists again.  There was a v0.11 a week ago as well that incorporated some earlier feedback from the kernel lists.

Changes since v0.11:

  • mapping_set_error on failed writepage
  • document correct debugfs mount point
  • simplify layout/striping ioctls
  • removed bad kmalloc in writepages
  • use mempools for writeback allocations where appropriate (*)
  • fixed a problem with capability, snap metadata writeback
  • cleaned up f(data)sync wrt metadata writeback
  • fixed a messenger bug causing random EBADF
  • some mds clustering fixes

And since v0.10:

  • server-specified max file size
  • kclient: simplified pr_debug macro
  • kclient: respond to control-c on mount
  • kclient: misc cleanups, fixes (LKML review)
  • mount updates /etc/mtab

Testing on our 100TB cluster is going well.  Planned items for v0.13 include:

  • improved availability of OSDs when cluster membership changes
  • client authentication
  • S3 compatible REST gateway for RADOS object store
  • Ceph file system module for Hadoop

* There are still some potential OOM situations during writeback from the messaging layer, but the fixes for that are planned for a bit later when it’s clear the messaging protocol isn’t going to change further.

posted by sage | No Comments | Tags: Releases

16 July 2009 - 9:40v0.10 released

We’ve released v0.10.  The big items this time around:

  • kernel client: some cleanup, unaligned memory access fixes
  • much debugging of MDS recovery: kernel client will now correctly untar, compile kernel with MDS server running in a 60 second restart loop.
  • a few misc mds fixes
  • osd recovery fixes
  • userspace client: many bug fixes, now quite stable
  • librados improvements

Also,

  • libceph: a thin wrapper around the POSIXy ceph interface

which is being used to write a file system ‘Broker’ for the Hypertable distributed database project.  We’re also planning on (finally) getting the Hadoop ceph client in working order.

We’re also continuing to work on the librados object storage layer, including a standalone fastcgi-based gateway exposing an S3-compatible restful interface, the goal being a drop-in replacement for apps using S3. (It won’t let you use the rados snapshots or object classes, though, and won’t scale as efficiently.)

As far as testing goes, we’re filling up a 100TB cluser locally and will start failure testing on that shortly.  And this past week we’ve been thorougly testing single-node) MDS recovery.  Next up is looping OSD restarts and power cycling.

Major todo items coming up next:

  • client authentication
  • additional metadata to facilitate catastrophic rebuild of fs hierarchy
  • stabilize clustered mds

We’ve also sent the Linux kernel client code off to LKML and -fsdevel again, and are continuing to work toward a merge into the mainline kernel.

UPDATE: Here are the relevant URLs:

posted by sage | 1 Comment | Tags: Releases