[ Pobierz całość w formacie PDF ]

The write-through cache is the simplest way to imple-
written is wasteful if the next operation modifies the next
ment cache coherency. If the cache line is written to,
word. One can easily imagine that this is a common oc-
the processor immediately also writes the cache line into
currence, the memory for horizontal neighboring pixels
main memory. This ensures that, at all times, the main
on a screen are in most cases neighbors, too. As the name
memory and cache are in sync. The cache content could
suggests, write-combining combines multiple write ac-
simply be discarded whenever a cache line is replaced.
cesses before the cache line is written out. In ideal cases
This cache policy is simple but not very fast. A pro-
the entire cache line is modified word by word and, only
gram which, for instance, modifies a local variable over
after the last word is written, the cache line is written to
and over again would create a lot of traffic on the FSB
the device. This can speed up access to RAM on devices
even though the data is likely not used anywhere else and
might be short-lived.
Finally there is uncacheable memory. This usually means
The write-back policy is more sophisticated. Here the
the memory location is not backed by RAM at all. It
processor does not immediately write the modified cache
might be a special address which is hardcoded to have
line back to main memory. Instead, the cache line is only
some functionality implemented outside the CPU. For
marked as dirty. When the cache line is dropped from the
commodity hardware this most often is the case for mem-
cache at some point in the future the dirty bit will instruct
ory mapped address ranges which translate to accesses
the processor to write the data back at that time instead
to cards and devices attached to a bus (PCIe etc). On
of just discarding the content.
embedded boards one sometimes finds such a memory
address which can be used to turn an LED on and off.
Write-back caches have the chance to be significantly
Caching such an address would obviously be a bad idea.
better performing, which is why most memory in a sys-
LEDs in this context are used for debugging or status re-
tem with a decent processor is cached this way. The pro-
ports and one wants to see this as soon as possible. The
cessor can even take advantage of free capacity on the
memory on PCIe cards can change without the CPU s
FSB to store the content of a cache line before the line
interaction, so this memory should not be cached.
has to be evacuated. This allows the dirty bit to be cleared
and the processor can just drop the cache line when the
3.3.4 Multi-Processor Support
room in the cache is needed.
But there is a significant problem with the write-back im-
In the previous section we have already pointed out the
plementation. When more than one processor (or core or
problem we have when multiple processors come into
hyper-thread) is available and accessing the same mem-
play. Even multi-core processors have the problem for
ory it must still be assured that both processors see the
those cache levels which are not shared (at least the L1d).
same memory content at all times. If a cache line is dirty
on one processor (i.e., it has not been written back yet) It is completely impractical to provide direct access from
and a second processor tries to read the same memory lo- one processor to the cache of another processor. The con-
cation, the read operation cannot just go out to the main nection is simply not fast enough, for a start. The prac-
memory. Instead the content of the first processor s cache tical alternative is to transfer the cache content over to
Ulrich Drepper Version 1.0 25
the other processor in case it is needed. Note that this the processor s cache handling visible to the outside. The
also applies to caches which are not shared on the same address of the cache line in question is visible on the ad-
processor. dress bus. In the following description of the states and
their transitions (shown in Figure 3.18) we will point out
The question now is when does this cache line transfer
when the bus is involved.
have to happen? This question is pretty easy to answer:
when one processor needs a cache line which is dirty in Initially all cache lines are empty and hence also Invalid.
another processor s cache for reading or writing. But If data is loaded into the cache for writing the cache
how can a processor determine whether a cache line is changes to Modified. If the data is loaded for reading
dirty in another processor s cache? Assuming it just be- the new state depends on whether another processor has
cause a cache line is loaded by another processor would the cache line loaded as well. If this is the case then the
be suboptimal (at best). Usually the majority of mem- new state is Shared, otherwise Exclusive.
ory accesses are read accesses and the resulting cache
If a Modified cache line is read from or written to on
lines are not dirty. Processor operations on cache lines
the local processor, the instruction can use the current
are frequent (of course, why else would we have this
cache content and the state does not change. If a sec-
paper?) which means broadcasting information about
ond processor wants to read from the cache line the first
changed cache lines after each write access would be im-
processor has to send the content of its cache to the sec-
ond processor and then it can change the state to Shared.
What developed over the years is the MESI cache co- The data sent to the second processor is also received
herency protocol (Modified, Exclusive, Shared, Invalid). and processed by the memory controller which stores the
The protocol is named after the four states a cache line content in memory. If this did not happen the cache line
can be in when using the MESI protocol: could not be marked as Shared. If the second processor
wants to write to the cache line the first processor sends
the cache line content and marks the cache line locally
Modified: The local processor has modified the cache
as Invalid. This is the infamous  Request For Owner-
line. This also implies it is the only copy in any
ship (RFO) operation. Performing this operation in the
last level cache, just like the I!M transition is compara-
tively expensive. For write-through caches we also have
Exclusive: The cache line is not modified but known to
to add the time it takes to write the new cache line con-
not be loaded into any other processor s cache.
tent to the next higher-level cache or the main memory, [ Pobierz całość w formacie PDF ]