From: "Stephen C. Tweedie" <sct@redhat.com>

For the past few months there has been a slow but steady trickle of reports
of oopses in kjournald.  Recently I got a couple of reports that were
repeatable enough to rerun with extra debugging code.

It turns out that we're releasing a journal_head while it is still linked
onto the transaction's t_locked_list.  The exact location is in
journal_unmap_buffer().  On several exit paths, that does:

		spin_unlock(&journal->j_list_lock); 
		jbd_unlock_bh_state(bh);
		spin_unlock(&journal->j_state_lock);
		journal_put_journal_head(jh);

releasing the jh *after* dropping the j_list_lock and j_state_lock.

kjournald can then be doing journal_commit_transaction():

	spin_lock(&journal->j_list_lock);
...
		if (buffer_locked(bh)) {
			BUFFER_TRACE(bh, "locked");
			if (!inverted_lock(journal, bh))
				goto write_out_data;
			__journal_unfile_buffer(jh);
			__journal_file_buffer(jh, commit_transaction,
						BJ_Locked);
			jbd_unlock_bh_state(bh);

The problem happens if journal_unmap_buffer()'s own put_journal_head()
manages to get in between kjournald's *unfile_buffer and the following
*file_buffer.  Because journal_unmap_buffer() has dropped its bh_state lock
by this point, there's nothing to prevent this, leading to a variety of
unpleasant situations.  In particular, the jh is unfiled at this point, so
there's nothing to stop the put_journal_head() from freeing the memory
we're just about to link onto the BJ_Locked list.

I _think_ that the attached patch deals with this, but I'm still awaiting
further testing to be sure.  I thought I might as well get some other ext3
eyes on it while I wait for that -- I'll let you know as soon as I hear
back from the other testing.

The patch works by making sure that the various exits from
journal_unmap_buffer() always call journal_put_journal_head() *before*
unlocking the j_list_lock.  This is correct according to the documented
lock ranking, and it also matches the order in the existing main exit path
at the end of the function.

Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 /dev/null                    |    0 
 25-akpm/fs/jbd/transaction.c |    6 +++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff -puN fs/jbd/transaction.c~ext3-jbd-race-releasing-in-use-journal_heads fs/jbd/transaction.c
--- 25/fs/jbd/transaction.c~ext3-jbd-race-releasing-in-use-journal_heads	Fri Mar  4 15:47:50 2005
+++ 25-akpm/fs/jbd/transaction.c	Fri Mar  4 15:47:50 2005
@@ -1774,10 +1774,10 @@ static int journal_unmap_buffer(journal_
 			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
 			ret = __dispose_buffer(jh,
 					journal->j_running_transaction);
+			journal_put_journal_head(jh);
 			spin_unlock(&journal->j_list_lock);
 			jbd_unlock_bh_state(bh);
 			spin_unlock(&journal->j_state_lock);
-			journal_put_journal_head(jh);
 			return ret;
 		} else {
 			/* There is no currently-running transaction. So the
@@ -1788,10 +1788,10 @@ static int journal_unmap_buffer(journal_
 				JBUFFER_TRACE(jh, "give to committing trans");
 				ret = __dispose_buffer(jh,
 					journal->j_committing_transaction);
+				journal_put_journal_head(jh);
 				spin_unlock(&journal->j_list_lock);
 				jbd_unlock_bh_state(bh);
 				spin_unlock(&journal->j_state_lock);
-				journal_put_journal_head(jh);
 				return ret;
 			} else {
 				/* The orphan record's transaction has
@@ -1812,10 +1812,10 @@ static int journal_unmap_buffer(journal_
 					journal->j_running_transaction);
 			jh->b_next_transaction = NULL;
 		}
+		journal_put_journal_head(jh);
 		spin_unlock(&journal->j_list_lock);
 		jbd_unlock_bh_state(bh);
 		spin_unlock(&journal->j_state_lock);
-		journal_put_journal_head(jh);
 		return 0;
 	} else {
 		/* Good, the buffer belongs to the running transaction.
diff -L fs/jbd/transaction.c.=K0000=.orig -puN /dev/null /dev/null
_