Retro CPC Dongle – Part 20

Finally, after weeks* of effort I can write about the SPI interface between the Atmel SAM4S supervisor chip and the support CPU in the FPGA. You’ll recall from my last post that I had this working in emulation, but anyone who has worked on FPGAs or RTL code before, a simulation is still a long way from a working configuration. Still, after a bit of work, I managed to get this:

I managed to reliably pass the string ‘ABCD’ (0x41 0x42 0x43 0x44 in hex) across the SPI interface in response to a keypress going from the picocom terminal through the USB serial port of the supervisor chip, passed through the SPI module of the Atmel SAM chip into the FPGA across 4 lines of control and signal, into the soft Z80 CPU. Responding to the incoming data, the software stored in the ‘ROM’ in the supervisor functions would read the transmitted keypress, store it in memory and return ‘ABCD’ across the SPI interface by requesting another transfer from the master.

The really cool thing, and the part that took the most time to work through is that this all happens at 40Mbps and doesn’t require any supervision from either the SAM4 device, or the support CPU inside the FPGA. The magic of DMA in the SAM4 master device will quietly shuffle bits out of the microcontroller into the FPGA, that in turn accepts each octet, and stores it in a FIFO buffer, ready for consumption by the support CPU when it’s ready.

I had to write a completely new SPI client module in Verilog, because most of the SPI module you’ll find on GitHub or Opencores are SPI masters, not SPI slaves. The complexity is derived from the fact that the SPI clock is both super fast and erratic. While many clocks used in digital design are carefully controlled frequencies with a typically balanced duty cycle (the time the signal is high vs the time the signal is low), the clock in an SPI may be none of those things. The only requirement for the SPI clock is that is it has a reasonable rise and fall time to minimize the is-it-on or or is-it-off indeterminate state that would cause problematic transitions.

In addition, the SPI clock is sourced from the master, which in this case is the Atmel SAM4S chip, which means that there is a second clock domain in the FPGA configuration. This presented further challenges as I needed to run the SPI logic purely from the SPI clock and not rely on any additional clock cycles to manage writing data to the FIFO. This proved to be a challenge, as I wrote about in my last post, but I ‘re-invented’ some clever techniques to make this happen. I’ll go into the workings of the read/write signal now because someone might find this useful.

The problem is this: on the falling edge of the last bit of the transferred byte, I want to write the data that was captured by the incoming shift register into the FIFO. That in itself is not a problem and can be handled with simple edge-triggered code:

always @(negedge spi_clk) if( bit_count == 3'd8 ) write_signal <= 1;

After the required minimum period for the write_signal to take effect, I want to return it to a low state to avoid glitches or other weirdness. You could use a double edged always block to return this signal to its inactive state:

always @(spi_clk) write_signal <= (bit_count == 3'd8) ? 1'b1 : 1'b0;

This worked fine in simulation and handled all of the tests that I threw at it, but the Quartus assembly process choked and the always block fired erratically when running in sillicon, or just fired at a time that it should not have fired. I simply could not get this to work reliably and consistently. I spent hours trying to debug this with Altera’s Signaltap logic probe, with no joy. I walked away from my workbench, frustrated, to think about alternatives. This is a common problem that needed a simple solution. Many times, a situation required just a ‘blip’ on a signal line to indicate ‘save this data’ or ‘go process what is collected’. If you think about how the negative-edge triggered code above would work, it would feed the clock into a simple gated d-type register. Technically a double edged always block just requires two of these, one with a negated input, but the double edged configuration cannot easily be cleared by the next clock edge as it would act on its sibling, not on the register that was just activated. (If that sounds confusing, it’s because it is and I’m not really clear why the double edged always block didn’t work in silicon but worked in simulation). Have a look at the code I’ll post my code into GitHub soon and would welcome your thoughts as to why this double edged always block worked in simulation but not in the FPGA.

The waveform from Signaltap is also interesting:

Signaltap trace

Note the highlighted transition at tick 126 on the outbound_tail trace row. This shows a counter counting up at the completion of every byte, removing one entry from the FIFO queue. The distilled code goes like this:

always @(negedge spi_clk) outbound_tail <= outbound_tail + 1'b1;

From this code, is should not be possible to see a ‘3’ in between a ‘1’ and a ‘2’. However, this is an excellent example of the problem of signals crossing clock domains. The clock that serviced the Signaltap logic was generated internally by the FPGA. The clock that was triggering the incremental counter was sourced from the Atmel SPI module. The Signaltap logic captured the wire states for the three wires making up the outbound_tail just as they were in transition. They don’t transition together, due to small delays in the logic path and the differing rise and fall times. What happened here was that one bit transitioned high before the other bit had a chance to transition low:

Time/Bit (0) (1)
    0  |  0   1
    1  |  1   1 < has not had time to fall to low state yet
    2  |  1   0

This is a good example of why a gray code is used to represent counters across clock domains. If you look at my FIFO buffer code, this is exactly what I use to compare the head and tail of the FIFO. The head of the fifo is managed by one clock domain and the tail is managed by another. It’s important that only one transition on the counter lines happens so that the counter can only appear to be in its previous state or its new state and not some other state that is out of sequence.

I also experienced another oddity when trying to capture large swathes of data to diagnose the double edge problem described earlier:

The JTAG link that I’d rigged onto the board through tiny patch wiring wasn’t really suitable for high speed data and was corrupting the Signaltap data as it came across. It’s an interesting message, as it suggests that Altera is making sure that the signals you get in your Signaltap traces are accurate by some sort of integrity check. Sort of like a CRC for your wires!

So, finally three months beyond where I expected to be, I had a reliable and performant SPI connection between the two devices. Here’s the normalized trace of a real signal going across:

SPI signals between the SAM4S and Atmel chips (click for large)

And finally, a screen shot of the updated Atmel SAM4S supervisor software:

You may spot that I’ve added a couple of new options to the menu.

  • Option 0 – connect to CPC – this provides a channel into a STDIO interface in the support CPU
  • Option 7 – toggle support CPU reset – this allows me to hold the Z80 CPU in reset while I upload a new ROM image, then release reset with either option 7 or option 0 to allow the program to run.

This works terrifically well and allows me to rapidly update the supervisor software without having to recompile the RTL modules (which takes about 3-4 minutes even now!). I can also scan the memory for key variables to ensure the code is working, even without having the STDIO channel working yet. I use the Quartus memory monitor to both upload and download / scan sections of memory for changes. This is how I can be sure my keystrokes are coming across the interface through SPI intact. I extract the received data and place it in a fixed memory position that I monitor and can see what’s transmitted.

Next steps are to tidy up the code, expand the FIFOs from their current test size of 8 bytes, write the STDIO code at both ends of the connection  and write a test harness to demonstrate STDIO happening inside the soft Z80 CPU.

Just a few more hours then! Feel free to leave comments or questions on this somewhat convoluted post and I’ll do my best to redraft anything that’s not clear.

Update: I’ve now posted my files on GitHub.

* I worked out that I get on average about 8 hours a week to work on this split over several evenings per week. If it seems slow to me, it must seem agonizing to anyone following this blog!


2 thoughts on “Retro CPC Dongle – Part 20

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s