Cache (Screen) Grab (Click for large)
This screen grab shows one of the key process steps in the caching controller, the cache line replacement. The red bracket indicates the cache i/o ports and some key internal state variables, and the blue bracket indicates the data cache i/o ports on the dual-port ram. The screen grab shows a single byte write, into a full cache line. I’ve illustrated a few key parts to the waveform with the white descriptor labels;
- Cache Miss/Flush – this is where the CPU requests a byte that is not in the cache. The cache state machine determines that both ways of the cache are full, and so a victim has to be selected for flushing. In this case, Way1 (indicated by hit_way) is flushed to be replaced with the cached data from main memory.
- To the right of that, we can see 8 words written to SDRAM through the word data output port (wdat_o), as soon as the SDRAM controller is ready for it (wvalid_i).
- The word port of the data cache (W Cache), outputs the 8 words in sequence straight onto the Dq lines.
- Then there’s a period of two clock that allow the SDRAM to ‘settle’ the write (see the data sheet for information).
- Immediately following is the read instruction to read in the memory locations requested. In the test scenario above, there is a short delay before the data is returned because the signals have to make their way through the output ports, through the arbiter and into the SDRAM controller. (There’s definitely some room for optimsation here, with 11 clocks wasted).
- When the read instruction is processed, 8 words of data are returned.
- The final step (top right) the single byte of data is stored into the newly setup cache line.
While this seems like it’s a bit heavy weight, it all happens in a single t-cycle (clock cycle) on the 4MHz Z80 CPU. There’s still another 1.5 clock cycles at 4MHz available before the data is read into the CPU, so there’s plenty of margin for issues. I’ll look to optimise and tune this so that the minimum number of clocks are wasted.
I’ve tested the SDRAM controller in my previous post in a test harness on the DE0-Nano SDRAM, although I had to modify the code to work with a 32MB chip instead of the target 16MB. Surprisingly, there were no issues with timing or metastability. I’ll test this cache controller and report back if there are any issues. I do need to work on optimising the code as there does seem to be a bit of a delay between requests. Otherwise, it’s looking good for the next build!