Hello Ian,
On 28/7/2023 10:16 am, Ian Kent wrote:
On 28/7/23 08:00, Ian Kent wrote:Yes this was the block I referred to in my second comment [1]. However adding
On 27/7/23 12:30, Imran Khan wrote:Ahh, I see why I thought this ...
Hello Ian,Yes, that's exactly what I intended (assuming you mean write lock in those cases)
Sorry for late reply. I was about to reply this week.
On 27/7/2023 10:38 am, Ian Kent wrote:
On 20/7/23 10:03, Ian Kent wrote:[...]
On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
kernfs direcytory node's child count changes in kernfs_(un)link_sibling andI do see a problem with recent changes.Anders I was hoping you would check if/what lockdep trace
I'll send this off to Greg after I've done some testing (primarily just
compile and function).
Here's a patch which describes what I found.
Comments are, of course, welcome, ;)
you get with this patch.
Imran, I was hoping you would comment on my change as it
relates to the kernfs_iattr_rwsem changes.
Ian
kernfs: fix missing kernfs_iattr_rwsem locking
From: Ian Kent <raven@xxxxxxxxxx>
When the kernfs_iattr_rwsem was introduced a case was missed.
The update of the kernfs directory node child count was also protected
by the kernfs_rwsem and needs to be included in the change so that the
child count (and so the inode n_link attribute) does not change while
holding the rwsem for read.
these are getting invoked while adding (kernfs_add_one),
removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
operations proceed under kernfs_rwsem and I see each invocation of
kernfs_link/unlink_sibling during the above mentioned operations is happening
under kernfs_rwsem.
So the child count should still be protected by kernfs_rwsem and we should not
need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
when I did it so now I wonder what I saw that lead to my patch, I'll need to look
again ...
It's the hunk:
@@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap,
kn = inode->i_private;
root = kernfs_root(kn);
- down_read(&root->kernfs_rwsem);
+ down_read(&root->kernfs_iattr_rwsem);
kernfs_refresh_inode(kn, inode);
ret = generic_permission(&nop_mnt_idmap, inode, mask);
- up_read(&root->kernfs_rwsem);
+ up_read(&root->kernfs_iattr_rwsem);
return ret;
}
which takes away the kernfs_rwsem and introduces the possibility of
the change. It may be more instructive to add back taking the read
lock of kernfs_rwsem in .permission() than altering the sibling link
and unlink functions, I mean I even caught myself on it.
back read lock of kernfs_rwsem in .permission() will again introduce the
bottleneck that I mentioned at [2]. So I think taking taking the locks in
link/unlink is the better option.
I understand having one lock to synchronize everything makes it easier
debug/development wise but sometimes (such as the case mentioned in [2]), it is
not optimum performance wise.
Thoughts ?
Thanks,
Imran
[1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@xxxxxxxxxx/
[2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@xxxxxxxxxx/
Ian
Kindly let me know your thoughts. I would still like to see new lockdep tracesIndeed, I hope Anders can find time to get the trace.
with this change.
Ian
Thanks,
Imran
Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
attributes)
Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
Cc: Anders Roxell <anders.roxell@xxxxxxxxxx>
Cc: Imran Khan <imran.f.khan@xxxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>
---
fs/kernfs/dir.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 45b6919903e6..6e84bb69602e 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
*kn)
rb_insert_color(&kn->rb, &kn->parent->dir.children);
/* successfully added, account subdir number */
+ down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
if (kernfs_type(kn) == KERNFS_DIR)
kn->parent->dir.subdirs++;
kernfs_inc_rev(kn->parent);
+ up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
return 0;
}
@@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
kernfs_node *kn)
if (RB_EMPTY_NODE(&kn->rb))
return false;
+ down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
if (kernfs_type(kn) == KERNFS_DIR)
kn->parent->dir.subdirs--;
kernfs_inc_rev(kn->parent);
+ up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
rb_erase(&kn->rb, &kn->parent->dir.children);
RB_CLEAR_NODE(&kn->rb);