RE: [PATCH V2 for-6.10/block 1/2] loop: Fix a race between loop detach and loop open
From: Gulam Mohamed
Date: Wed May 22 2024 - 15:11:26 EST
Hi Jens,
> -----Original Message-----
> From: Jens Axboe <axboe@xxxxxxxxx>
> Sent: Wednesday, May 22, 2024 7:10 AM
> To: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>; linux-
> block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Cc: shinichiro.kawasaki@xxxxxxx; chaitanyak@xxxxxxxxxx; hch@xxxxxx
> Subject: Re: [PATCH V2 for-6.10/block 1/2] loop: Fix a race between loop
> detach and loop open
>
> On 5/21/24 4:42 PM, Gulam Mohamed wrote:
> > Description
> > ===========
> >
> > 1. Userspace sends the command "losetup -d" which uses the open() call
> > to open the device
> > 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
> > function loop_clr_fd()
> > 3. If LOOP_CLR_FD is the first command received at the time, then the
> > AUTOCLEAR flag is not set and deletion of the
> > loop device proceeds ahead and scans the partitions (drop/add
> > partitions)
> >
> > if (disk_openers(lo->lo_disk) > 1) {
> > lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
> > loop_global_unlock(lo, true);
> > return 0;
> > }
> >
> > 4. Before scanning partitions, it will check to see if any partition of
> > the loop device is currently opened 5. If any partition is
> > opened, then it will return EBUSY:
> >
> > if (disk->open_partitions)
> > return -EBUSY;
> > 6. So, after receiving the "LOOP_CLR_FD" command and just before the
> above
> > check for open_partitions, if any other command
> > (like blkid) opens any partition of the loop device, then the partition
> > scan will not proceed and EBUSY is returned as shown in above code
> > 7. But in "__loop_clr_fd()", this EBUSY error is not propagated 8. We
> > have noticed that this is causing the partitions of the loop to
> > remain stale even after the loop device is detached resulting in the
> > IO errors on the partitions
> >
> > Fix
> > ---
> > Re-introduce the lo_open() call to restrict any process to open the
> > loop device when its being detached
> >
> > Test case
> > =========
> > Test case involves the following two scripts:
> >
> > script1.sh
> > ----------
> > while [ 1 ];
> > do
> > losetup -P -f /home/opt/looptest/test10.img
> > blkid /dev/loop0p1
> > done
> >
> > script2.sh
> > ----------
> > while [ 1 ];
> > do
> > losetup -d /dev/loop0
> > done
> >
> > Without fix, the following IO errors have been observed:
> >
> > kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
> > kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
> > phys_seg 1 prio class 0
> > kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
> > phys_seg 1 prio class 0
> > kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
> > read
> >
> > V1->V2:
> > Added a test case, 010, in blktests in tests/loop/
> > Signed-off-by: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>
> > ---
> > drivers/block/loop.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c index
> > 28a95fd366fe..9a235d8c062d 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -1717,6 +1717,24 @@ static int lo_compat_ioctl(struct block_device
> > *bdev, blk_mode_t mode, } #endif
> >
> > +static int lo_open(struct gendisk *disk, blk_mode_t mode) {
> > + struct loop_device *lo = disk->private_data;
> > + int err;
> > +
> > + if (!lo)
> > + return -ENXIO;
> > +
> > + err = mutex_lock_killable(&lo->lo_mutex);
> > + if (err)
> > + return err;
> > +
> > + if (lo->lo_state == Lo_rundown)
> > + err = -ENXIO;
> > + mutex_unlock(&lo->lo_mutex);
> > + return err;
> > +}
>
> Most of this function uses spaces rather than tabs.
I am sorry I should have checked that. Will address the issue in next version
>
> --
> Jens Axboe