Chapter 8: Memory Management

Created: 2025-12-16
Updated: 2025-12-16

Background

  • Registers and Main Memory are the only storage directly accessible to the CPU
  • Collection of processes are loaded into memory and be executed
  • Multiple processes are brought into memory and run concurrently
  • A process may be moved between memory and disk during its execution

Address Binding

Multistep Processing of a Program

  • compile time: translate source code to machine code
  • load time: link the program and load it into memory
  • execution time: run the program

Address Binding Timing

  • Compile Time
    • compiler translates symbolic code into absolute code
    • if starting location changes → recompile
  • Load Time
    • compiler translates symbolic code into relocatable code(.BS+0x1234)
    • when loading, translate to physical address(0xABCD)
    • if starting location changes → reload the code
  • Execution Time
    • compiler translates symbolic code into logical addresses (virtual-address) code(0x18)
    • even when loading, keep using logical addresses(0x18)
    • special hardware, Memory Management Unit (MMU), is used to map logical to physical addresses
    • even CPU don't know the physical address, only use logical address

Memory-Management Unit (MMU)

  • hardware device that maps logical addresses to physical addresses
  • the user program deals with logical addresses; it never sees the physical addresses

Dynamic Loading

  • a routine is loaded into memory only when it is called, (not in load time)
    • unused routines are never loaded
    • better memory space utilization
    • required implemented through program design (no special support from OS):
      C
      #include <dlfcn.h>
      int main() {
          double (*cosine)(double);
          void* handle = dlopen ("/lib/libm.so.6", RTLD_LAZY);
          cosine = dlsym(handle, "cos");
          printf ("%f\n", (*cosine)(2.0));
          dlclose(handle);
      }
      

Linking

  • Static Linking: libraries are combined with object code at load time
    • results in a larger executable file, duplicated code
  • Dynamic Linking: linking postponed until execution time
    • only one code copy in memory and shared by everyone
    • need to compile to dynamic link library
    • stub is included in the executable to ask the OS to load the library when needed
    • stub call -> check if library is in memory -> if not, load it -> execute the lib

Swapping

  • a process can be swapped temporarily out of memory to a backing store (disk) and then brought back into memory for continued execution
    • also used by midterm scheduling, not context switching
  • Backing Store: a chunk of disk, separated from the file system, to provide direct access to memory images
  • Advantages
    • free up memory
    • swap lower-priority processes out with a higher one
  • Swap back: if binding is done at compile/load time, swap back must be the same
  • A process to be swapped MUST be idle (a process is waiting for IO)
    • solution:
    1. Never swap a process with pending I/O
    2. I/O ops are done through OS buffers (user process can be swapped out, OS does I/O on its behalf), then copy data to/from user process when swapped back
    • I/O device → I/O Controller (own buffer) → OS Buffer → User Process
  • Major problem: swap time is too long \propto amount of memory swapped
    • solution: use demand paging instead of swapping entire process

Contiguous Memory Allocation

  • Fixed-partition allocation
    • each process loads into one partition of fixed size
    • degree of multiprogramming = # of partitions
  • Variable-partition allocation
    • Hole: block of contiguous free memory, (scattered in memory)
    • First-fit: find the first hole that fits
    • Best-fit: find the smallest hole that fits
    • Worst-fit: find the largest hole
  • Buddy Cell Allocation Algorithm
    • a heuristic algo for best-fit policy
    • split memory into halves recursively until the smallest block that fits is found
    • those two blocks are called "buddies" (same size, contiguous) can be merged later
    • search time: O(log n)
    • merge and split: O(1)
  • External Fragmentation
    • total mem space big enough, but not contiguous
    • only occur in variable-partition allocation
    • Solution: compaction
      • shuffle memory contents to place all free memory together in one large block
      • only possible if relocation is dynamic, done at execution time
  • Internal Fragmentation
    • allocated memory may be slightly larger than requested memory
    • only occur in fixed-partition allocation

Paging -- Non-Contiguous Memory Allocation

  • Divide physical memory into fixed-size blocks called frames
  • Divide logical memory into blocks of the same size called pages
  • to run a program of size n pages, need to find n free frames and load the program
  • Benefits
    • non-contiguous allocation
    • no external fragmentation
    • limited internal fragmentation
    • provide shared memory/page
  • Page Table: maps the base of each page (page number) to the base of each frame
    • page table only includes pages owned by a process
    • a process cannot access memory outside its allocated pages (mem protection)
    • page number p: base address of a page
    • frame number f: base address of a frame
  • logical address = page number (p) + page offset (d)
    • N bits of p -> can allocate at most 2N2^N pages -> 2N×page size2^N \times \text{page size}
    • M bits of d -> page size=2M\text{page size} = 2^M bytes
  • physical address = frame number (f) + page offset (d)
  • Address Translation
    • p + d; p --MMU-> f; f + d
    • # of pages: determine the logical memory size of a process
    • # of frames: depending on the size of physical memory
  • free frame linked list: maintained by OS
  • Page Size: a power of 2, 512 ~ 16MB, typically 4KB
    • too small -> large page table, more pages per process -> more I/O ops for loading pages
    • too large -> internal fragmentation
  • Page Table
    • kept in memory
    • PTBR (Page-Table Base Register): the physical address of the page table (used by MMU)
      • store in PCB (Process Control Block) during context switch
    • With PTBR, each memory reference requires 2 memory accesses -> solved by TLB
  • TLB (Translation Lookaside Buffer)
    • associative memory: parallel search O(1)
    • limited entries: 64 ~ 2048
    • cached (page number, frame number) pairs when MMU translating addresses
      • TLB hit: get frame number from TLB
      • TLB miss: use original by PTBR (2 memory accesses), then update TLB
    • flush TLB during context switch
  • EMAT (Effective Memory Access Time)
    • example: 20ns for TLB lookup + 100ns for memory access
    • 70% TLB hit ratio: EMAT=0.7×(20+100)+0.3×(20+2×100)=150ns\text{EMAT} = 0.7 \times (20 + 100) + 0.3 \times (20 + 2 \times 100) = 150ns
    • 98% TLB hit ratio: EMAT=0.98×(20+100)+0.02×(20+2×100)=122ns\text{EMAT} = 0.98 \times (20 + 100) + 0.02 \times (20 + 2 \times 100) = 122ns
  • Memory Protection: each page is associated with a set of protection bits
    • e.g. valid/invalid bit
      • valid: the page/frame is in the process's logical/physical address space
      • invalid: the page/frame is not in the process's logical/physical address space
    • PTLR (Page-Table Length Register): indicates the size of the page table (used by MMU)
  • Shared Pages
    • reentrant code (pure code): never change during execution, can be shared
    • only one copy in physical memory
    • multiple virtual addresses map to the same physical address

Page table would be huge and difficult to be loaded. e.g. 4GB (2322^{32}) logical memory with 4KB (2122^{12}) page would have 1 million (2202^{20}) pages table entries. Three solutions:

Hierarchical Paging

  • break the page table into multiple levels
  • Two-level paging (32-bit address with 4KB page size)
    • 10 bits for outer page number → 1K (2102^{10}) page table entries
    • 10 bits for inner page number → 1K (2102^{10}) page table entries
    • 12 bits for offset d -> 4KB (2122^{12}) page size
    • → 3 memory accesses
  • 64-bit systems:
    • 12 (p1) + 10 (p2) + 10 (p3) + 10 (p4) + 10 (p5) + 12 (d)
    • require 6 memory accesses

Hashed Page Tables

  • commonly-used for address > 32 bits
  • virtual page number is hashed into a hash table
  • hash table size varies; larger -> smaller chains in each entry
  • each entry
    • (virtual page number, frame number, next pointer)
    • pointer waste memory & traverse linked list waste time

Inverted Page Tables

  • no page table per process
  • maintains a frame table for the whole memory (one entry per frame)
    • each entry: (PID, virtual page number)
  • increases memory access time
    • each access needs to search the whole frame table
    • solution: use hashing for the frame table
  • hard to support shared pages !!! Critical

Segmentation -- Non-Contiguous Memory Allocation

  • a program = a collection of segments
  • a segment = a logical unit e.g. main program, procedure, function, data structure, stack
  • logical address = (segment number, offset)
  • Segment Table: maps segment number to (base (4B), limit(4B))
    • STBR (Segment-Table Base Register): the physical address of the segment table
    • STLR (Segment-Table Length Register): indicates the size of the segment table
  • process:
    1. segment number -> segment table -> get base and limit
    2. check offset < limit?
    3. physical address = base + offset
  • shared segments: same logical address in different processes
  • Protection & Sharing
    • Read-only segment: code (code shared at segment level)
    • Read-Write segment: data, heap, stack

Segmentation with Paging

  • CPU --logical address--> Segmentation --linear address--> Paging --physical address--> Memory
  • Example: The Intel Pentium
    • logical address: (selector 16, offset 32)
      • selector: segment number 13, GDT/LDT 1, protection info 2
        • LDT: Local Descriptor Table (2132^{13} entries)
        • GDT: Global Descriptor Table (2132^{13} entries)
      • max # of segments per process: 2142^{14}
      • size of a segment: 232=4GB\le 2^{32} = 4\text{GB}
    • linear address: segment base address(output of segment descriptor) + offset = 32-bit linear address
    • take 32-bit linear address as paging logical address
      • 10 (p1) + 10 (p2) + 12 (d)
      • the last bit of p1 can be used as a page size flag
        • 0: 4KB page (normal 10 p2 + 12 d)
        • 1: 4MB page (directly use 22 d)

OS-ch8-example-question.png