Doesn’t time fly? It’s been 6 months since my last post! My only excuse is that I started a new job and learning a new culture and processes is pretty exhausting. I have tended to work on this project during the evening, but kids being what they are, rarely co-operate when you need some project time. Time to work on the CPC2 has been limited indeed.
There’s been a fair bit of activity though, so let’s take you though what has been done.
HyperRAM is essentially fairly simple and at slower clock speeds, it’s pretty easy to get a large volume of data in and out of the chip. However, once you get over about 30MHz, timing gets a little challenging to understand.
I started with a Verilog simulation of both the HyperRAM and the controller and used a test bench to check the logic of streaming data in and out. I created a controller HyperRAM controller core that had two interfaces, one for a byte operations and one for streaming. The byte interface would read or write a single byte for serving the CPC Z80 core. The streaming interface would service the video data cores, both reading or writing 1024 bytes at a time. The interfaces were mutually exclusive and only one can be used at any one time.
To ensure that the 4MHz Z80 isn’t starved of data, I needed to ensure the RAM was returning data to the CPU less than 375nS after data was requested. This sounds like a generous time amount of time when the HyperRAM can run at 100MHz with a 40nS access time.
However, the HyperRAM receives instructions over the bus as well and requires 3 clocks for the instruction and between 3 and 8 clocks of latency. The worst case scenario is that 12 clocks with a total period of 120nS is needed before any data is available on the data pins. Still sounds good, doesn’t it? Well, as with all things in the real world, physics got in the way.
The HyperRAM is a DDR device, meaning that for 100MHz clock, I needed to be able to run parts of the logic that would have the data available in half the clock cycle, or 5nS. This timing is possible, but can be difficult to achieve with low-end devices, such as the speed class 8 devices that I’m building around.
When I used the DDR Altera IP, I had no issues in the simulation, but compiling the core and running it in real silicon, the data seemed somewhat scrambled. By this, I mean that the data was there, but it came out in a sequence that was byte-swapped from the simulation and the first byte was missing. I was stumped for some time on this problem.
I couldn’t diagnose the problem easily with SignalTap as it didn’t seem to read the BIDIR port reliably at full speed and compilation errors suggested that I couldn’t meet the timing requirements of the SignalTap core. Since the core is essentially a black-box, it wasn’t possible for me to diagnose why the compiler couldn’t meet the timing requirements of its own obfuscated tools. I was left guessing why the simulation wasn’t matching real compiled core. It was quite obvious that it was a timing issue, but it wasn’t obvious how I was to find and fix it.
Ultimately, after reading and re-reading the datasheet for the HyperRAM, one timing parameter gave me a clue. The parameter describes the time from the clock transition to the data lines being valid as being 7nS. Since at 100MHz, I was sampling the first data byte at 5nS from the clock transition, it was sampling the data lines before they were valid.
To prove this theory, I slowed down the interface to 10MHz, giving the data lines plenty of time to settle before the core samples the data signals. The core worked perfectly at this speed and this was my “ah-ha” moment!
I tried a number of clock speeds to try and determine at which point the interface stopped working correctly. Adjusting the clock period up and down, I worked out that around 35MHz the interface stopped functioning and started to drop the first byte and reverse the byte order.
So it turns out that at 35MHz, half a clock cycle, is around 7nS, which is the time it takes for the DQ lines to become valid after the clock transitions. Above this speed, the DQ lines before valid after the clock has transitioned for a second time and the core capture the first data point. In this situation, the data bytes are sampled into the DDR register in reverse. I dallied with adjusting the timing constraints file to change the relationship between the clock and data lines, but this was complicated by the fact that I was sampling the DDR data in the middle of the data eye. It was far easier to adjust the core to swap the data byte lines from the DDR registers and delay by one clock. Using this technique, I don’t have to adjust the timing constraints and I can leave the CK-DQ valid at half of the clock period.
After many, many iterations of the HyperRAM core, I decided that it wasn’t necessary to run the core at 100MHz, so I opted to clock the core at half the top speed, just 50MHz. This is still over the 35MHz inversion point, so the core logic didn’t have to change. I toyed with the idea of creating a self-calibrating data timing core that would work out where the center of the data eye is, but this seemed overkill for the application.
However, 50MHz made the timing quite tight for the byte interface. You’ll recall from earlier that at 100MHz, the first data byte isn’t available until 120nS from the request. At 50MHz, it’s a minimum of 240nS before any data is available at the data pins. When you add in the timing cost of the 3-stage synchronizers from the CPU to the core (+60nS) and the synchronizers to get the data back (+20nS), plus the cost of the internal finite state machine logic the total period is around 360nS of the maximum 375nS. Eeek, tight timing and fortunately, the timing works for both the byte interface and the streaming interface.
After the RAM timing was working, I proceeded to update the video core to read and write blocks of data from the RAM core. The CRTC renderer captured blocks of 800 pixels in a complete scan line before flushing to the RAM. It used two buffers to allow one buffer to be written to the RAM while the other is being rendered by the next scan line.
The HDMI video render core will read the memory one scan line ahead of time and save this data ready for transport across the HDMI data lines., through the ADV7513 video chip.
The HyperRAM acts as a buffer and allows the CRTC to run at 50Hz and the HDMI output to run at 60Hz without conflicting. There will be some screen tearing evident with the core as it’s coded now, but this mechanism has the potential to allow repeating frames to prevent image tearing. The double line buffering in both the CRTC and HDMI cores enables a catch up frame in the HDMI every 5 CRTC frame writes, if needed.
The only remaining system logic is the USB code for the Microchip USB3320 to handle USB keyboards and joysticks. I put this to one side for now, in favour of getting a new board built.
A new PCB and unplanned obsolescence
After using the DE10-Nano with the add-on HyperRAM boards for recent developments, I decided it was time for a new iteration of the custom board, correcting a lot of the mistakes of previous builds and simplifying the design.
Here’s the board layouts:
You may be able to tell that these boards are a little larger than previous versions. I needed the extra space for the mounting posts in the corners as well as fitting the ESP32 on the bottom of the board. The mounting posts will be used to mount the board inside the CPC2 case as well as keeping the board off the oven bed when baking the top side components.
I’ve also moved the connectors to the top edge (back) of the board, just like the CPC as this will be more in keeping with the CPC style when mounted in a 3D printed case.
You might spot a few footprint changes, such as segmented ground flags under the video chip near CONN2, the USB chip and newly added USB to serial chip in the top left. These segmented ground flags should help avoid some of the solder paste bridging that I’d experienced in earlier builds as solder paste flowed out from under these ground flags to bridge the pins. This was the cause of the failure in the USB chip in the previous build and necessitated removing this chip to make the board work.
I’ve also added a bunch of new LEDs (because one can never have enough LEDs!) for serial communication across the USB-Serial connection, power and indicating hard-disk activity.
I also added the 1mm pitch eMMC that will provide on-board storage for the CPC2 and the two HyperRAM modules.
I dropped the SAM4 microcontroller in favour of just using the JTAG connection for programming and a 256MB NOR Flash for power-up booting.
Some of the components had been end-of-life’d (EOL). This means that the manufacturer had stopped making this component, usually in favour of a superior design. This had happened to me in a previous build, as some passive components has been EOL’d (who retires SMT capacitors anyway???). Unfortunately this time, the component EOL’d was the micro USB-B port connector that provided both power and debug to the board. Without this, the board wouldn’t function at all as this connector provides all of the power to the board. There was no alternative and no compatible replacement, so my board design was trash unless I could find these connectors.
I found some connectors available for $27 on AliExpress for a batch of 1 (boo!) and fortunately, Element14 still had some stock, available until exhausted. I ordered some stock.
Also, it seems that nobody is stocking the HDMI chip any more. The ADV7513 seems to be in very short supply. I checked Analog Devices web site and it seems the devices are still in production and recommended for new designs, so I don’t know why Mouser, Digikey and Element14 are no longer stocking these devices. I suspect it has something to do with a non-disclosure agreement to use the device due to HDMI capabilities, but this is a bit of a moot point as the documentation is available on many web sites. AliExpress has dozens of suppliers willing to provide these chips, but I wasn’t willing to try my luck on this build. I have a stock of half a dozen of these chips, so I’m good for a while.
While I’m waiting for everything to arrive in the post, I’ll work on the USB host core and research how the heck 3D printing works for the CPC2 case.