[LSN-0107-1] Linux kernel vulnerability
[USN-6882-2] Cinder regression
[USN-7088-4] Linux kernel vulnerabilities
[USN-7095-1] Linux kernel vulnerabilities
[USN-7089-3] Linux kernel vulnerabilities
[LSN-0107-1] Linux kernel vulnerability
Linux kernel vulnerabilities
A security issue affects these releases of Ubuntu and its derivatives:
- Ubuntu 20.04 LTS
- Ubuntu 18.04 LTS
- Ubuntu 16.04 LTS
- Ubuntu 22.04 LTS
- Ubuntu 14.04 LTS
Summary
Several security issues were fixed in the kernel.
Software Description
- linux - Linux kernel
- linux-aws - Linux kernel for Amazon Web Services (AWS) systems
- linux-azure - Linux kernel for Microsoft Azure Cloud systems
- linux-gcp - Linux kernel for Google Cloud Platform (GCP) systems
- linux-gke - Linux kernel for Google Container Engine (GKE) systems
- linux-gkeop - Linux kernel for Google Container Engine (GKE) systems
- linux-ibm - Linux kernel for IBM cloud systems
- linux-oracle - Linux kernel for Oracle Cloud systems
Details
In the Linux kernel, the following vulnerability has been resolved:
inet: inet_defrag: prevent sk release while still in use ip_local_out()
and other functions can pass skb->sk as function argument. If the skb is
a fragment and reassembly happens before such function call returns, the
sk must not be released. This affects skb fragments reassembled via
netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as
part of tx pipeline. Eric Dumazet made an initial analysis of this bug.
Quoting Eric: Calling ip_defrag() in output path is also implying
skb_orphan(), which is buggy because output path relies on sk not
disappearing. A relevant old patch about the issue was : 8282f27449bf
(“inet: frag: Always orphan skbs inside ip_defrag()”) [..
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one. If we orphan the packet in ipvlan,
then downstream things like FQ packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used. Eric suggested to
stash sk in fragment queue and made an initial patch. However there is a
problem with this: If skb is refragmented again right after,
ip_do_fragment() will copy head->sk to the new fragments, and sets up
destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem
accouting to reflect the fully reassembled skb, else wmem will
underflow. This change moves the orphan down into the core, to last
possible moment. As ip_defrag_offset is aliased with sk_buff->sk member,
we must move the offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue. In the former
case, things work as before, skb is orphaned. This is safe because skb
gets queued/stolen and won’t continue past reasm engine. In the latter
case, we will steal the skb->sk reference, reattach it to the head skb,
and fix up wmem accouting when inet_frag inflates truesize.
(CVE-2024-26921)
In the Linux kernel, the following vulnerability has been resolved:
af_unix: Fix garbage collector racing against connect() Garbage
collector does not take into account the risk of embryo getting enqueued
during the garbage collection. If such embryo has a peer that carries
SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list. sockets
are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening
in-flight socket bound to addr, not in fdtable V’s fd will be passed via
sendmsg(), gets inflight count bumped connect(S, addr) sendmsg(S, [V]);
close(V) __unix_gc() —————- ————————- ———– NS = unix_create1() skb1 =
sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L)
unix_peer(S) = NS // V count=1 inflight=0 NS = unix_peer(S) skb2 =
sock_alloc() skb_queue_tail(NS, skb2[V]) // V became in-flight // V
count=2 inflight=1 close(V) // V count=1 inflight=1 // GC candidate
condition met for u in gc_inflight_list: if (total_refs ==
inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in
gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not
// reachable from L yet, so V’s // inflight remains unchanged
__skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates:
if (u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1
inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock
its state. This makes GC wait until the end of any ongoing connect() to
that socket. After flipping the lock, a possibly SCM-laden embryo is
already enqueued. And if there is another embryo coming, it can not
possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen
because unix_gc_lock is already taken. Inflight graph remains
unaffected. (CVE-2024-26923)
In the Linux kernel, the following vulnerability has been resolved: mm:
swap: fix race between free_swap_and_cache() and swapoff() There was
previously a theoretical window where swapoff() could run and teardown a
swap_info_struct while a call to free_swap_and_cache() was running in
another thread. This could cause, amongst other bad possibilities,
swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to
access the freed memory for swap_map. This is a theoretical problem and
I haven’t been able to provoke it from a test case. But there has been
agreement based on code review that this is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn’t present in get_swap_device()
because it doesn’t make sense in general due to the race between getting
the reference and swapoff. So I’ve added an equivalent check directly in
free_swap_and_cache(). Details of how to provoke one possible issue
(thanks to David Hildenbrand for deriving this): –8try_to_unuse() will stop as soon as soon as
si->inuse_pages==0. So the question is: could someone reclaim the folio
and turn si->inuse_pages==0, before we completed
swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in
the swapcache. Only 2 subpages are still references by swap entries.
Process 1 still references subpage 0 via swap entry. Process 2 still
references subpage 1 via swap entry. Process 1 quits. Calls
free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in
the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). ->
count == SWAP_HAS_CACHE Process 2 goes ahead, passes
swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()-> … WRITE_ONCE(si->inuse_pages,
si->inuse_pages - nr_entries); What stops swapoff to succeed after
process 2 reclaimed the swap cache but before process1 finished its call
to swap_page_trans_huge_swapped()? –8