Re: Possible UDF locking error?

From: Jan Kara
Date: Wed Apr 03 2019 - 04:07:06 EST


Hi,

On Sat 30-03-19 14:49:46, Steve Magnani wrote:
> On 3/25/19 11:42 AM, Jan Kara wrote:
> > Hi!
> >
> > On Sat 23-03-19 15:14:05, Steve Magnani wrote:
> > > I have been hunting a UDF bug that occasionally results in generation
> > > of an Allocation Extent Descriptor with an incorrect tagLocation. So
> > > far I haven't been able to see a path through the code that could
> > > cause that. But, I noticed some inconsistency in locking during
> > > AED generation and wonder if it could result in random corruption.
> > >
> > > The function udf_update_inode() has this general pattern:
> > >
> > > bh = udf_tgetblk(...); // calls sb_getblk()
> > > lock_buffer(bh);
> > > memset(bh->b_data, 0, inode->i_sb->s_blocksize);
> > > // <snip>other code to populate FE/EFE data in the block</snip>
> > > set_buffer_uptodate(bh);
> > > unlock_buffer(bh);
> > > mark_buffer_dirty(bh);
> > >
> > > This I can understand - the lock is held for as long as the buffer
> > > contents are being assembled.
> > >
> > > In contrast, udf_setup_indirect_aext(), which constructs an AED,
> > > has this sequence:
> > >
> > > bh = udf_tgetblk(...); // calls sb_getblk()
> > > lock_buffer(bh);
> > > memset(bh->b_data, 0, inode->i_sb->s_blocksize);
> > >
> > > set_buffer_uptodate(bh);
> > > unlock_buffer(bh);
> > > mark_buffer_dirty_inode(bh);
> > >
> > > // <snip>other code to populate AED data in the block</snip>
> > >
> > > In this case the population of the block occurs without
> > > the protection of the lock.
> > >
> > > Because the block has been marked dirty, does this mean that
> > > writeback could occur at any point during population?
> > Yes. Thanks for noticing this!
> >
> > > There is one path through udf_setup_indirect_aext() where
> > > mark_buffer_dirty_inode() gets called again after population is
> > > complete, which I suppose could heal a partial writeout, but there is
> > > also another path in which the buffer does not get marked dirty again.
> > Generally, we add new extents to the created indirect extent which dirties
> > the buffer and that should fix the problem. But you are definitely right
> > that the code is suspicious and should be fixed. Will you send a patch?
>
> I did a little archaeology to see how the code evolved to this point. It's
> been like this a long time.
>
> I also did some research to understand why filesystems use lock_buffer()
> sometimes but not others. For example, the FAT driver never calls it. I ran
> across this thread from 2011:
>
> https://lkml.org/lkml/2011/5/16/402
>
> ...from which I conclude that while it is correct in a strict sense to hold
> a lock on a buffer any time its contents are being modified, performance
> considerations make it preferable (or at least reasonable) to make some
> modifications without a lock provided it's known that a subsequent write-out
> will "fix" any potential partial write out before anyone else tries to read
> the block.

Understood but UDF (and neither FAT) are really that performance critical.
If you look for performance, you'd certainly pick a different filesystem.
UDF is mainly for data interchange so it should work reasonably for copy-in
copy-out style of workloads, the rest isn't that important. So there
correctness and simplicity is preferred over performance.

> I doubt that UDF sees common use with DIF/DIX block devices,
> which might make a decision in favor of performance a little easier. Since
> the FAT driver doesn't contain Darrick's proposed changes I assume a
> decision was made that performance was more important there.
>
> Certainly the call to udf_setup_indirect_aext() from udf_add_aext() meets
> those criteria. But udf_table_free_blocks() may not dirty the AED block.
>
> So if this looks reasonable I will resend as a formal patch:
>
> --- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500
> +++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500
> @@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode
> return -EIO;
> lock_buffer(bh);
> memset(bh->b_data, 0x00, sb->s_blocksize);
> - set_buffer_uptodate(bh);
> - unlock_buffer(bh);
> - mark_buffer_dirty_inode(bh, inode);
> aed = (struct allocExtDesc *)(bh->b_data);
> if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) {
> @@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode
> udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block,
> sizeof(struct tag));
> + set_buffer_uptodate(bh);
> + unlock_buffer(bh);
> +
> nepos.block = neloc;
> nepos.offset = sizeof(struct allocExtDesc);
> nepos.bh = bh;
> @@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode
> } else {
> __udf_add_aext(inode, epos, &nepos.block,
> sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0);
> + /* Make sure completed AED gets written out */
> + mark_buffer_dirty_inode(nepos.bh, inode);

Why do you mark the buffer as dirty only here? I'd just mark it dirty after
unlocking. If __udf_add_aext() or udf_write_aext() modify the buffer, they
will mark it as dirty as well... Thanks!

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR