When I added support to allow generic netlink multicast groups to be
restricted to subscribers with CAP_NET_ADMIN I was unaware that a
genl_bind implementation already existed in the past.
It was reverted due to ABBA deadlock:
1. ->netlink_bind gets called with the table lock held.
2. genetlink bind callback is invoked, it grabs the genl lock.
But when a new genl subsystem is (un)registered, these two locks are
taken in reverse order.
One solution would be to revert again and add a comment in genl
referring 1e82a62fec, "genetlink: remove genl_bind").
This would need a second change in mptcp to not expose the raw token
value anymore, e.g. by hashing the token with a secret key so userspace
can still associate subflow events with the correct mptcp connection.
However, Paolo Abeni reminded me to double-check why the netlink table is
locked in the first place.
I can't find one. netlink_bind() is already called without this lock
when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt.
Same holds for the netlink_unbind operation.
Digging through the history, commit f773608026
("netlink: access nlk groups safely in netlink bind and getname")
expanded the lock scope.
commit 3a20773bee ("net: netlink: cap max groups which will be considered in netlink_bind()")
... removed the nlk->ngroups access that the lock scope
extension was all about.
Reduce the lock scope again and always call ->netlink_bind without
the table lock.
The Fixes tag should be vs. the patch mentioned in the link below,
but that one got squash-merged into the patch that came earlier in the
Fixes: 4d54cc3211 ("mptcp: avoid lock_fast usage in accept path")
Cc: Cong Wang <firstname.lastname@example.org>
Cc: Xin Long <email@example.com>
Cc: Johannes Berg <firstname.lastname@example.org>
Cc: Sean Tranchetti <email@example.com>
Cc: Paolo Abeni <firstname.lastname@example.org>
Cc: Pablo Neira Ayuso <email@example.com>
Signed-off-by: Florian Westphal <firstname.lastname@example.org>
Signed-off-by: David S. Miller <email@example.com>