自从看了 Linus 喷人的帖子以后,反省了一下,在此分享一个好的 Pull Request。
从 Linus Kernel 看来的帖子,结构如下:
mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGE…
…S block during isolation for migration
commit 0bf380b upstream.
When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned. Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned. This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash. This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.
PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s"
#0 [d72d3ad0] crash_kexec at c028cfdb
#1 [d72d3b24] oops_end at c05c5322
#2 [d72d3b38] __bad_area_nosemaphore at c0227e60
#3 [d72d3bec] bad_area at c0227fb6
#4 [d72d3c00] do_page_fault at c05c72ec
#5 [d72d3c80] error_code (via page_fault) at c05c47a4
EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000
DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50
CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002
#6 [d72d3cb4] isolate_migratepages at c030b15a
#7 [d72d3d14] zone_watermark_ok at c02d26cb
#8 [d72d3d2c] compact_zone at c030b8de
#9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431
DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788
SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50
CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202
It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.
BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000
It is expected that it also affects 3.2.x and current mainline.
The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned. Lets say we have a case
like this
H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole
H------|------H------|----m-Hoooooo|ooooooH-f----|------H
The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole. When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.
This patch ensures that isolate_migratepages calls pfn_valid when
necessary.
Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
-------