5.2 Cache memory: design, mapping, and replacement policies
4 min read•august 13, 2024
Cache memory is a crucial component in modern computer systems, bridging the performance gap between fast processors and slower main memory. It leverages the principle of locality to store frequently accessed data, reducing memory access latency and improving overall system performance.
Designing an effective cache involves carefully balancing parameters like size, , and associativity. Various mapping techniques and replacement policies are used to optimize cache performance, each with its own trade-offs between simplicity, flexibility, and efficiency.
Cache Memory Purpose and Benefits
Bridging the Performance Gap
Top images from around the web for Bridging the Performance Gap
Makalah cache memory ~ Mas-syaiful blog's View original
Is this image relevant?
1 of 3
Cache memory is a small, fast memory located close to the processor that stores frequently accessed data and instructions
Reduces the average time to access memory by providing faster access to frequently used data
Helps bridge the performance gap between the fast processor and slower main memory
Reducing Memory Access Latency
Cache memory reduces the number of accesses to main memory
Improves overall system performance by reducing memory access latency
Increases the effective memory bandwidth by servicing more memory requests from the cache
Principle of Locality
The principle of locality, which includes both temporal and spatial locality, makes cache memory effective
Temporal locality refers to the tendency of a processor to access the same memory locations repeatedly within a short period (loops, frequently called functions)
Spatial locality refers to the tendency of a processor to access memory locations that are close to each other (sequential instructions, array elements)
Cache Memory Design Parameters
Cache Size
Cache size refers to the total capacity of the cache memory, typically measured in bytes or kilobytes (4KB, 8KB, 16KB)
A larger cache size can store more data but may increase access latency and power consumption
The optimal cache size depends on factors such as the processor architecture, workload characteristics, and power and area constraints
Block Size
Block size, also known as size, is the unit of data transfer between the main memory and the cache
A larger block size can exploit spatial locality by bringing in more adjacent data from memory
However, a larger block size may lead to a higher and increased memory traffic due to bringing in unnecessary data
Common block sizes include 32 bytes, 64 bytes, and 128 bytes
Associativity
Associativity determines how many possible locations in the cache a particular block from main memory can be mapped to
cache: Each block in main memory maps to only one possible location in the cache
cache: A block can be placed anywhere in the cache, providing the most flexibility but requiring more complex hardware for tag comparison
N-way set-associative cache: The cache is divided into sets, each containing N blocks (2-way, 4-way, 8-way)
A block can be placed in any of the N locations within a specific set, providing a balance between flexibility and hardware complexity
Cache Mapping Techniques: Direct vs Associative vs Set-Associative
Direct Mapping
Direct mapping is the simplest cache mapping technique, where each block in main memory maps to a unique location in the cache based on its memory address
Offers fast access times but may lead to higher conflict misses
The cache location is determined by the formula: Cache_Index=(Block_Address)mod(Number_of_Blocks_in_Cache)
Associative Mapping
Associative mapping allows a block to be placed anywhere in the cache, providing the most flexibility in terms of block placement
Requires comparing the tag of each block in the cache simultaneously, which increases hardware complexity and power consumption
Provides the lowest miss rate but at the cost of higher access latency and hardware overhead
Set-Associative Mapping
Set-associative mapping is a compromise between direct and associative mapping
The cache is divided into sets, and each set contains multiple blocks (2-way, 4-way, 8-way)
A block can be placed in any location within its corresponding set, determined by a portion of its memory address
Reduces conflict misses compared to direct mapping while maintaining lower hardware complexity than fully associative mapping
The cache location is determined by the formulas:
Set_Index=(Block_Address)mod(Number_of_Sets)
Tag=(Block_Address)/(Number_of_Sets)
Cache Replacement Policies: LRU vs FIFO vs Random
Least Recently Used (LRU)
LRU policy replaces the block that has been accessed least recently
Assumes that blocks that have been recently accessed are more likely to be accessed again in the near future
Generally provides good performance by exploiting temporal locality
Requires additional hardware to track the access order of blocks (timestamp or stack)
First-In, First-Out (FIFO)
FIFO policy replaces the block that was brought into the cache earliest, regardless of its access history
Simpler to implement than LRU but may not perform as well in exploiting locality
Maintains a queue of blocks in the order they were brought into the cache
The oldest block in the queue is replaced when a new block needs to be cached
Random Replacement
policy randomly selects a block to be replaced when the cache is full
Simplest to implement but may not effectively capture locality, leading to suboptimal performance
Does not require any additional hardware or tracking mechanisms
May be used in low-cost or low-power systems where simplicity is prioritized over performance
Advanced Replacement Policies
Advanced replacement policies, such as Pseudo-LRU and Least Frequently Used (LFU), aim to balance performance and hardware complexity
Pseudo-LRU approximates the behavior of LRU using a binary tree structure, reducing the hardware overhead
LFU replaces the block that has been accessed the least number of times, prioritizing frequently accessed blocks
The choice of replacement policy depends on factors such as cache size, associativity, workload characteristics, and design constraints (performance, power, area)