How time flies – it’s been a month since I posted. I’ve been waiting for the PCBs ordered from OSH Park to arrive. They were tracked all the way from the US to my local post office, and somehow Australia Post managed to lose them 😦 The helpful people at OSH Park didn’t hesitate, they put through another copy of my board on the next production run at no cost to me. They’re on their way already – thank you, OSH Park! I can’t rate these guys highly enough!
So, let’s start with the obligatory graphic:
This piece of the project is fairly complex, so I had to put together this document noting key aspects of the controller and refined it as I went through the development and testing. While the state diagram is valid, it’s simplified as there are several more sequential steps inside each state ‘bubble’. Take a look at the code in GitHub, the sdram.v file contains the main code. Line 62, the parameters show the states.
While the SDRAM controller is fairly generic, the general premise of the RAM interface is that each bank of the RAM will service a different function. Bank 1 will be the CPC banked RAM (base 64K will still be FPGA block ram), while bank 2-4 will service the video buffer and ROMs on a rotating basis. Rotating banks 2-4 through video service will allow the video to be triple buffered, providing the frame perfect emulation needed for split mode video effects. Take a look at page 2 of the ‘design’ document for an explanation of how I think this will work.
Following my new code approach the design worked well, with just minimum tweaks. The finite state machine (FSM) that controls the SDRAM turned out to be fairly simple. There’s a great write up of SDRAM operation on the FPGA4Fun site. The only complexity turns out to be the refresh process. This happens regularly, 4096 times every 64mS in accordance with the data sheet for the ram chip. Unfortunately, this can happen at the most inconvenient times, such as when the CPC requests the next CPU instruction. So, I tested what would happen if the refresh was required just before the CPU issued a request for the next instruction. Here’s the simulation waveform.
Between the white and red vertical bars is when the refresh and read operations are happening, which is one half of the CPC 4MHz clock. Pay particular attention to row 2, the state. This is the FSM internal state. Here’s what’s happening:
- Around the white line, the FSM decides that a refresh is due (hidden in Refresh{}).
- All banks are pre-charged(closed), shown as state PREC+
- Refresh is issued, shown as state RFSH
- The command is given time to operate (60ns), shown as state NOP
- Then the row is opened, shown as state OPEN+
- A read issued for the correct column, shown as state READ
- Then after 2 clock, 8 data words are returned over the next 8 clocks
The first word is returned on the falling edge of the CPC 4MHz clock. So this means that even if the worst possible timing conditions happen, the data will still be returned in the T1 cycle. It’s not required until the end of the T2 cycle, so there is no need for any wait states that would ruin the timing accuracy of the CPC2.
There is one other timing sequence that could cause an issue, that is when the video module requests an 8-word burst just before the CPC request, and a refresh falls due during that process, which would mean the CPC request would have to wait for both the video burst and the refresh to finish before it can open the row and read the CPC program data. However, this would still complete before the end of T1 and so I don’t expect to need to test this.
If the CPU requests data during a refresh the request is queued. If a further request is placed before the first request is completed, then it’s simply ignored.
Now that I have the low-level SDRAM access sorted, the next step is to tidy up the code then write the caching controller. This controller will provide a number of conduits for the CPC RAM, ROM and Video and provide the appropriate byte interface for the CPC.
I hope my OSH Park boards will have arrived by then!
Previous Post <====> Next Post
p.s. Grimware.org is dead!!!!! For whatever reason, Grimware is offline and has been for a few weeks. This was an amazing source of information for all things CPC. The author of the site Grim, was a CPC demoscener and I’ve reached out to some of his buddies to try and find out where Grim’s gone, but no response so far. Luckily the Internet Archive has crawled the pages so his legacy lives on, but it’s a sad turn of events.
In the old days e.g. in Apple II DRAM was accessed by CPU and video generator on the opposite phases of the main clock, and video generator was opening next DRAM row on each access, effectively doing refresh for free.
I’m just curious if it could be possible to use video generation to refresh SDRAM? Like conclude each SDRAM access from video generator with precharge and use LSB video generator counter bits as row number? Or would that be too slow? Just thinking out loud, u know… 🙂 I’d need to do lil’ math.
LikeLike
Hi codepainters, you’re right, using the video circuitry to refresh the DRAM would work in the old asynchronous DRAM. Simply closing a row in async DRAM would initiate a refresh of that row, so the video would have the effect of refreshing the DRAM. This would assume the entire DRAM is scanned for the video output, thereby consuming all memory. Assuming the entire memory is consumed for every frame, then full refresh 1sec/60FPS would be about 17mS, which is well within the requirements for most DRAM.
Synchronous DRAM works differently, using an instruction table, so it requires a special instruction to refresh the row. Also, modern SDRAM tends to be a very large capacity, so it’s unlikely to consume all of the SDRAM for the video frame, leaving parts of the chip without a refresh. It’s also quite inefficient to refresh after each row access, severely limiting the SDRAM usefulness.
A useful read is the Z80 reference guide that discusses the built-in DRAM refresh for older async chips as this is a good solution to refresh between memory read/write accesses.
LikeLike
Hi! Apologies for a late follow-up, but I’m quite busy these days.
I believe I understand SDRAM command-based interface, but I think I wasn’t clear enough in my previous comment. Let me explain my way of thinking then. Please forgive stating the obvious occasionally, I just want the explanation to be complete and easy to follow. Still, it’s a kind of thought experiment, I may as well be totally wrong 🙂
1. Speaking of good,old DRAM chips – as we know, there are a few ways to refresh: RAS only (where you have to supply a row address, I tend to think about it as “read without actual read”), CAS-before-RAS and hidden refresh, where internal row counter is used. Let’s call all three methods an “explicit refresh”.
But here’s a nice quote from the TMS4464 datasheet: “A normal read or write cycle will refresh all bits in each row that is selected”, let’s call it “implicit refresh”. This is exactly the feature that makes the “refresh by video generator” trick possible at all.
For me it makes perfect sense – as far as I understand, opening a row is destructive – it basically reads the DRAM cells row via sense amplifiers into the internal row buffer, discharging the cell capacitors at the same time. The buffer has to be written back to restore previous content (I’m not sure when exactly it happens – when !RAS goes back high, I suppose), so effectively the row is refreshed on each access.
Obviously, in typical applications, DRAM access patterns are, well, random, and that’s exactly why some form of explicit refresh is needed.
2. Let’s switch to SDRAM now. I believe the inner working of the RAM array is pretty much the same, and the difference is in the synchronous interface mostly. ACTIVATE command corresponds to !RAS going low, READ/WRITE maps to !CAS going low, and PRECHARGE is equivalent to !RAS going high again.
Obviously ACTIVATE + READ + PRECHARGE would also refresh the row (as reading a row is destructive). I can’t find any hard evidence for that, but I believe that ACTIVATE + PRECHARGE is enough to refresh the row. That would correspond with RAS-only refresh in the old DRAMs.
3. If above is true, then why would anyone need AUTO REFRESH command (let’s ignore SELF REFRESH, as it’s irrelevant here), if ACTIVATE/PRECHARGE is enough for explicit refresh?
I believe it is pure convenience – SDRAM maintains its own row refresh counter, so the user doesn’t need to. It’s pretty much the same as CAS-before-RAS explicit refresh for old DRAMs. And perhaps shaving a few clock cycles.
Even one of the Micron datasheets seems to point this analogy: “AUTO REFRESH is used during normal operation of the SDRAM and is analogous to CAS#-BEFORE-RAS# (CBR) refresh in conventional DRAMs.”
4. But how about ensuring every row is accessed during video frame generation?
Well, old computers used tricks like address line swapping to achieve that – video buffer never occupies whole address space, anyway, but who says we must use MSBs of the address as row address and LSBs as column address?
AFAIK you use AS4C8M16SA 8Mx16 chip, so let’s assume 2 bank bits, 12 bits for row, 9 bits for column, 64ms to refresh.
Let’s map the address word A[22:0] like this:
Bank: A[16:15]
Row: A[14:3]
Column: A22 A21 A20 A19 A18 A17 A2 A1 A0
With such mapping (or interleaving, if you will) accessing a continuous block of 2^17 addresses = 128kB (2^17 aligned) is enough to touch every row in every bank, effectively refreshing the whole chip. Also, 3 LSBs of the address are mapped to 3 LSBs of the column number, so 8-byte bursts are still possible. And if it’s done on each video frame… 🙂
I’m not sure, if that would work for your application, but I just wanted to illustrate the idea, anyway, I hope it’s a bit clearer now. It pretty much boils down to the assumptions I make in point 2 above – I will definitely try to verify it on one of my FPGA boards.
And btw, I can’t wait for new posts on your blog, this project truly rocks! I’m also into retro stuff these days, implementing uPD765 emulation on Cypress PSoC4 recently 🙂 Unfortunately I’ve too little time for my side projects (parenting + working at start-up leaves little to no free time).
LikeLike
Thanks Codepainters, that’s a really interesting concept. Although the datasheet for the SDRAM I used does not mention refreshing on the close row command, I believe you’re correct with your technical description of the destructive read. I recall reading this somewhere and so if you touch every row, 16384 (4096 rows x 4 banks) or 32768 (8192 rows x 4 banks) of them depending upon the part used, it should keep the capacitors refreshed.
The obvious advantage of using the auto refresh is that it refreshes each row in all four banks at the same time, reducing the number of row refreshes to 4096 or 8192 and conserving the interface bandwidth.
I edited an earlier version of this comment because, WOW, I finally just understood what you were saying. That’s pretty smart. I’d need 5 of the lower 9 bits remain intact to faciltate longer bursts for efficiency. This would remove the need for discrete refresh cycles. I’ll test that out when I get the framebuffer working. Thanks for the suggestion!
LikeLike
I’ll definitely try to experiment with this concept on the BeMicro MAX10 board (Altera/Intel MAX10M08 + 8MB SDRAM). I’ll let you know if it worked for me.
Good point about auto refresh – I overlooked it refreshes all banks at the same time 🙂
LikeLike