7. Linux Memory Management
7.1 Overview
Linux uses segmentation + pagination, which simplifies notation.
Segments
Linux uses only 4 segments:
- 2 segments (code and data/stack) for KERNEL SPACE from [0xC000 0000] (3 GB) to [0xFFFF FFFF] (4 GB)
- 2 segments (code and data/stack) for USER SPACE from [0] (0 GB) to [0xBFFF FFFF] (3 GB)
__ 4 GB--->| | | | Kernel | | Kernel Space (Code + Data/Stack) | | __| 3 GB--->|----------------| __ | | | | | | 2 GB--->| | | | Tasks | | User Space (Code + Data/Stack) | | | 1 GB--->| | | | | | |________________| __| 0x00000000 Kernel/User Linear addresses
7.2 Specific i386 implementation
Again, Linux implements Pagination using 3 Levels of Paging, but in i386 architecture only 2 of them are really used:
------------------------------------------------------------------ L I N E A R A D D R E S S ------------------------------------------------------------------ \___/ \___/ \_____/ PD offset PF offset Frame offset [10 bits] [10 bits] [12 bits] | | | | | ----------- | | | | Value |----------|--------- | | | | |---------| /|\ | | | | | | | | | | | | | | | | | | Frame offset | | | | | | | \|/ | | | | | |---------|<------ | | | | | | | | | | | | | | | | x 4096 | | | | PF offset|_________|------- | | | | /|\ | | | PD offset |_________|----- | | | _________| /|\ | | | | | | | | | | | \|/ | | \|/ _____ | | | ------>|_________| PHYSICAL ADDRESS | | \|/ | | x 4096 | | | CR3 |-------->| | | | |_____| | ....... | | ....... | | | | | Page Directory Page File Linux i386 Paging
7.3 Memory Mapping
Linux manages Access Control with Pagination only, so different Tasks will have the same segment addresses, but different CR3 (register used to store Directory Page Address), pointing to different Page Entries.
In User mode a task cannot overcome 3 GB limit (0 x C0 00 00 00), so only the first 768 page directory entries are meaningful (768*4MB = 3GB).
When a Task goes in Kernel Mode (by System call or by IRQ) the other 256 pages directory entries become important, and they point to the same page files as all other Tasks (which are the same as the Kernel).
Note that Kernel (and only kernel) Linear Space is equal to Kernel Physical Space, so:
________________ _____ |Other KernelData|___ | | | |----------------| | |__| | | Kernel |\ |____| Real Other | 3 GB --->|----------------| \ | Kernel Data | | |\ \ | | | __|_\_\____|__ Real | | Tasks | \ \ | Tasks | | __|___\_\__|__ Space | | | \ \ | | | | \ \|----------------| | | \ |Real KernelSpace| |________________| \|________________| Logical Addresses Physical Addresses
Linear Kernel Space corresponds to Physical Kernel Space translated 3 GB down (in fact page tables are something like { "00000000", "00000001" }, so they operate no virtualization, they only report physical addresses they take from linear ones).
Notice that you'll not have an "addresses conflict" between Kernel and User spaces because we can manage physical addresses with Page Tables.
7.4 Low level memory allocation
Boot Initialization
We start from kmem_cache_init (launched by start_kernel [init/main.c] at boot up).
|kmem_cache_init |kmem_cache_estimate
kmem_cache_init [mm/slab.c]
kmem_cache_estimate
Now we continue with mem_init (also launched by start_kernel[init/main.c])
|mem_init |free_all_bootmem |free_all_bootmem_core
mem_init [arch/i386/mm/init.c]
free_all_bootmem [mm/bootmem.c]
free_all_bootmem_core
Run-time allocation
Under Linux, when we want to allocate memory, for example during "copy_on_write" mechanism (see Cap.10), we call:
|copy_mm |allocate_mm = kmem_cache_alloc |__kmem_cache_alloc |kmem_cache_alloc_one |alloc_new_slab |kmem_cache_grow |kmem_getpages |__get_free_pages |alloc_pages |alloc_pages_pgdat |__alloc_pages |rmqueue |reclaim_pages
Functions can be found under:
- copy_mm [kernel/fork.c]
- allocate_mm [kernel/fork.c]
- kmem_cache_alloc [mm/slab.c]
- __kmem_cache_alloc
- kmem_cache_alloc_one
- alloc_new_slab
- kmem_cache_grow
- kmem_getpages
- __get_free_pages [mm/page_alloc.c]
- alloc_pages [mm/numa.c]
- alloc_pages_pgdat
- __alloc_pages [mm/page_alloc.c]
- rm_queue
- reclaim_pages [mm/vmscan.c]
TODO: Understand Zones
7.5 Swap
Overview
Swap is managed by the kswapd daemon (kernel thread).
kswapd
As other kernel threads, kswapd has a main loop that wait to wake up.
|kswapd |// initialization routines |for (;;) { // Main loop |do_try_to_free_pages |recalculate_vm_stats |refill_inactive_scan |run_task_queue |interruptible_sleep_on_timeout // we sleep for a new swap request |}
- kswapd [mm/vmscan.c]
- do_try_to_free_pages
- recalculate_vm_stats [mm/swap.c]
- refill_inactive_scan [mm/vmswap.c]
- run_task_queue [kernel/softirq.c]
- interruptible_sleep_on_timeout [kernel/sched.c]
When do we need swapping?
Swapping is needed when we have to access a page that is not in physical memory.
Linux uses ''kswapd'' kernel thread to carry out this purpose. When the Task receives a page fault exception we do the following:
| Page Fault Exception | cause by all these conditions: | a-) User page | b-) Read or write access | c-) Page not present | | -----------> |do_page_fault |handle_mm_fault |pte_alloc |pte_alloc_one |__get_free_page = __get_free_pages |alloc_pages |alloc_pages_pgdat |__alloc_pages |wakeup_kswapd // We wake up kernel thread kswapd Page Fault ICA
- do_page_fault [arch/i386/mm/fault.c]
- handle_mm_fault [mm/memory.c]
- pte_alloc
- pte_alloc_one [include/asm/pgalloc.h]
- __get_free_page [include/linux/mm.h]
- __get_free_pages [mm/page_alloc.c]
- alloc_pages [mm/numa.c]
- alloc_pages_pgdat
- __alloc_pages
- wakeup_kswapd [mm/vmscan.c]
Next Previous Contents