Next Previous Contents

7. Linux Memory Management

7.1 Overview

Linux uses segmentation + pagination, which simplifies notation.

Segments

Linux uses only 4 segments:

  • 2 segments (code and data/stack) for KERNEL SPACE from [0xC000 0000] (3 GB) to [0xFFFF FFFF] (4 GB)
  • 2 segments (code and data/stack) for USER SPACE from [0] (0 GB) to [0xBFFF FFFF] (3 GB)

                               __
   4 GB--->|                |    |
           |     Kernel     |    |  Kernel Space (Code + Data/Stack)
           |                |  __|
   3 GB--->|----------------|  __
           |                |    |
           |                |    |
   2 GB--->|                |    |
           |     Tasks      |    |  User Space (Code + Data/Stack)
           |                |    |
   1 GB--->|                |    |
           |                |    |
           |________________|  __| 
 0x00000000
          Kernel/User Linear addresses
 

7.2 Specific i386 implementation

Again, Linux implements Pagination using 3 Levels of Paging, but in i386 architecture only 2 of them are really used:

 
   ------------------------------------------------------------------
   L    I    N    E    A    R         A    D    D    R    E    S    S
   ------------------------------------------------------------------
        \___/                 \___/                     \_____/ 
 
     PD offset              PF offset                 Frame offset 
     [10 bits]              [10 bits]                 [12 bits]       
          |                     |                          |
          |                     |     -----------          |        
          |                     |     |  Value  |----------|---------
          |     |         |     |     |---------|   /|\    |        |
          |     |         |     |     |         |    |     |        |
          |     |         |     |     |         |    | Frame offset |
          |     |         |     |     |         |   \|/             |
          |     |         |     |     |---------|<------            |
          |     |         |     |     |         |      |            |
          |     |         |     |     |         |      | x 4096     |
          |     |         |  PF offset|_________|-------            |
          |     |         |       /|\ |         |                   |
      PD offset |_________|-----   |  |         |          _________|
            /|\ |         |    |   |  |         |          | 
             |  |         |    |  \|/ |         |         \|/
 _____       |  |         |    ------>|_________|   PHYSICAL ADDRESS 
|     |     \|/ |         |    x 4096 |         |
| CR3 |-------->|         |           |         |
|_____|         | ....... |           | ....... |
                |         |           |         |    
 
               Page Directory          Page File

                       Linux i386 Paging
 


7.3 Memory Mapping

Linux manages Access Control with Pagination only, so different Tasks will have the same segment addresses, but different CR3 (register used to store Directory Page Address), pointing to different Page Entries.

In User mode a task cannot overcome 3 GB limit (0 x C0 00 00 00), so only the first 768 page directory entries are meaningful (768*4MB = 3GB).

When a Task goes in Kernel Mode (by System call or by IRQ) the other 256 pages directory entries become important, and they point to the same page files as all other Tasks (which are the same as the Kernel).

Note that Kernel (and only kernel) Linear Space is equal to Kernel Physical Space, so:

 
            ________________ _____                    
           |Other KernelData|___  |  |                |
           |----------------|   | |__|                |
           |     Kernel     |\  |____|   Real Other   |
  3 GB --->|----------------| \      |   Kernel Data  |
           |                |\ \     |                |
           |              __|_\_\____|__   Real       |
           |      Tasks     |  \ \   |     Tasks      |
           |              __|___\_\__|__   Space      |
           |                |    \ \ |                |
           |                |     \ \|----------------|
           |                |      \ |Real KernelSpace|
           |________________|       \|________________|
      
           Logical Addresses          Physical Addresses
 

Linear Kernel Space corresponds to Physical Kernel Space translated 3 GB down (in fact page tables are something like { "00000000", "00000001" }, so they operate no virtualization, they only report physical addresses they take from linear ones).

Notice that you'll not have an "addresses conflict" between Kernel and User spaces because we can manage physical addresses with Page Tables.

7.4 Low level memory allocation

Boot Initialization

We start from kmem_cache_init (launched by start_kernel [init/main.c] at boot up).

|kmem_cache_init
   |kmem_cache_estimate

kmem_cache_init [mm/slab.c]

kmem_cache_estimate

Now we continue with mem_init (also launched by start_kernel[init/main.c])

|mem_init
   |free_all_bootmem
      |free_all_bootmem_core

mem_init [arch/i386/mm/init.c]

free_all_bootmem [mm/bootmem.c]

free_all_bootmem_core

Run-time allocation

Under Linux, when we want to allocate memory, for example during "copy_on_write" mechanism (see Cap.10), we call:

|copy_mm 
   |allocate_mm = kmem_cache_alloc
      |__kmem_cache_alloc
         |kmem_cache_alloc_one
            |alloc_new_slab
               |kmem_cache_grow
                  |kmem_getpages
                     |__get_free_pages
                        |alloc_pages
                           |alloc_pages_pgdat
                              |__alloc_pages
                                 |rmqueue   
                                 |reclaim_pages

Functions can be found under:

  • copy_mm [kernel/fork.c]
  • allocate_mm [kernel/fork.c]
  • kmem_cache_alloc [mm/slab.c]
  • __kmem_cache_alloc
  • kmem_cache_alloc_one
  • alloc_new_slab
  • kmem_cache_grow
  • kmem_getpages
  • __get_free_pages [mm/page_alloc.c]
  • alloc_pages [mm/numa.c]
  • alloc_pages_pgdat
  • __alloc_pages [mm/page_alloc.c]
  • rm_queue
  • reclaim_pages [mm/vmscan.c]

TODO: Understand Zones

7.5 Swap

Overview

Swap is managed by the kswapd daemon (kernel thread).

kswapd

As other kernel threads, kswapd has a main loop that wait to wake up.

|kswapd
   |// initialization routines
   |for (;;) { // Main loop
      |do_try_to_free_pages
      |recalculate_vm_stats
      |refill_inactive_scan
      |run_task_queue
      |interruptible_sleep_on_timeout // we sleep for a new swap request
   |}

  • kswapd [mm/vmscan.c]
  • do_try_to_free_pages
  • recalculate_vm_stats [mm/swap.c]
  • refill_inactive_scan [mm/vmswap.c]
  • run_task_queue [kernel/softirq.c]
  • interruptible_sleep_on_timeout [kernel/sched.c]

When do we need swapping?

Swapping is needed when we have to access a page that is not in physical memory.

Linux uses ''kswapd'' kernel thread to carry out this purpose. When the Task receives a page fault exception we do the following:

 
 | Page Fault Exception
 | cause by all these conditions: 
 |   a-) User page 
 |   b-) Read or write access 
 |   c-) Page not present
 |
 |
 -----------> |do_page_fault
                 |handle_mm_fault
                    |pte_alloc 
                       |pte_alloc_one
                          |__get_free_page = __get_free_pages
                             |alloc_pages
                                |alloc_pages_pgdat
                                   |__alloc_pages
                                      |wakeup_kswapd // We wake up kernel thread kswapd
   
                   Page Fault ICA
 

  • do_page_fault [arch/i386/mm/fault.c]
  • handle_mm_fault [mm/memory.c]
  • pte_alloc
  • pte_alloc_one [include/asm/pgalloc.h]
  • __get_free_page [include/linux/mm.h]
  • __get_free_pages [mm/page_alloc.c]
  • alloc_pages [mm/numa.c]
  • alloc_pages_pgdat
  • __alloc_pages
  • wakeup_kswapd [mm/vmscan.c]

Next Previous Contents
Copyright © 2010-2024 Platon Technologies, s.r.o.           Home | Man pages | tLDP | Documents | Utilities | About
Design by styleshout