System/1 Build Log

Home | Build Log

An ongoing chronology of System/1's construction, and any other related musings.

14th November, 2016 - Revisiting the clock and machine cycle generator design

This entry was backdated.

One of the earliest parts of the system I prototyped, back in September 2014, was the clock and machine cycle generator. This contains an oscillator to generate a clock (at 4MHz in the version I put together on a breadboard) along with the logic to implement the main states of the machine — the six it cycles through for each instruction (FETCH_1, FETCH_2, FETCH_3, EXECUTE, MEM_OP, STORE) plus two supervisory states (SUPER_SHORT and SUPER_LONG). As mentioned in the previous discussion, these states come in two varieties; most take 32 clock cycles to complete (corresponding to the 32 bits in the machine word, since this is a bit-serial design), whereas others are shorter (to avoid wasting an entire 32-bit cycle for a memory access that can complete in two clocks). Since the registers in the machine should only be clocked in full 32-bit words, this means two clock signals need to be generated; BIT_CLK always runs in 32-cycle bursts, and is used to clock data bits around the machine, and MEM_CLK is free to run for as long as is needed since it won't affect the alignment of data stored in registers. In addition, a SYS_CLK clock is provided which is ungated; this turned out to be required by the memory data register, which needs to clock serial data internally to the CPU as well as parallel data to/from the memory bus during the shorter cycles.

All of this had already been prototyped and tested on the breadboard, but now that more of the system is completed — notably the instruction decoder — it became apparent that my original design was a bit slow. Although it had worked fine, using a 74HC161 synchronous counter feeding a 74HC4514 4-to-16 decoder to keep track of the active machine cycle is clunky; when I sat down to work out the worst-case timings from the (admittedly very pessimistic) figures given in the relevant datasheets, it looked like I might be getting close to nearly an entire clock cycle at 4MHz before the outputs were guaranteed stable. This wouldn't leave very long for the instruction decoder to generate the right control signals when entering the EXECUTE phase, much less for the ALU to calculate the first bit of the result before it was needed. I'm not totally wedded to the 4MHz clock I'd prototyped with — it was just the crystal I happened to have on hand — but it seems a reasonable sort of figure to aim for.

Luckily I already knew the solution to this tight squeeze, as it had been suggested to me when I still mucking about with the earlier design: rather than use a 4-bit counter and then decoding that to give the various cycle signals, I could use a simple 8-bit D-type flip-flop to drive them directly and define the next state for each cycle in terms of simple logic equations based on the current state. This is known as a one-hot encoding of the current state, and I'd previously decided against it because it sounded like harder work than the design I'd already cooked up. Now it seemed like it might be time to revisit that decision...

Sitting down with the logic analyser, a breadboard and a nice big pile of wires, I reconstructed the section of the earlier circuit responsible for counting clock cycles (to generate either the shorter or longer trains of pulses) and then used this to generate a clock signal for an 8-bit DFF. Since this starts off empty, there was then a slightly-unfortunate mess of wiring to decode the all-zeroes state to the PANEL_SYNC signal (asserted when the machine is idle and awaiting operator interaction from the front panel); this then became the basis to enter either the FETCH_1, SUPER_SHORT or SUPER_LONG states as requested. FETCH_2, FETCH_3 and EXECUTE always follow on from FETCH_1 in that order, so they consist of a single wire each, and then MEM_OP and STORE follow on in turn from EXECUTE if requested by the instruction decoder. Within about an hour I had a working Mk. II design, which looked significantly faster when examined on the logic analyser — the state signals change almost immediately after the relevant clock edge, as is to be expected from a simple flip-flop, so I no longer need to worry about wasting a large chunk of my timing budget on simply deciding what part of the instruction cycle the machine is in.

With that built and looking good in isolation, I figured I'd try attaching it to the rest of the system to see how well it worked in practice. Unhooking the Nucleo board that had been generating all the clock and state signals during my front-panel testing, I hooked flying leads up the appropriate lines out of the new state machine, reconnected the memory interface and RAM board (since the Nucleo had been standing in for those, too, in order to synthesize an instruction stream on command), and toggled some test code in via the temporary front panel. The panel responded to button presses as expected, which was a good sign, but the machine promptly went haywire when asked to execute the code. Hmm.

After a bit of digging it became apparent that this was not actually the fault of the new clock generator — in fact, it was replacing the Nucleo-emulated memory interface with the real one that triggered it. Taking a closer look at the instruction word register as code was executing showed that the word was shifted right by one bit, which naturally meant it then wasn't being decoded correctly; some experimentation showed that this was happening as the code was being entered via the front panel, and in fact toggling the data display from memory data to an internal register and back was enough to cause the same shift. A little headscratching later and I decided this was probably caused by a weak pull-down on the LOAD_MDR line; since the MDR is clocked from SYS_CLK, if LOAD_MDR is slow to deassert after data has been loaded into the register it'll end up being shifted along during the following clock cycle (or cycles, if it's particularly slow!). In normal execution this isn't a problem, as the instruction decoder will take LOAD_MDR low explicitly during the following cycle. However, when performing memory operations via the front panel, the front panel controller can only assert LOAD_MDR; it relies upon the pull-down to deassert it when released. In this case it appears this wasn't happening quickly enough, and installing a stronger pull-down resistor on the breakout board fixed the problem. None of the other registers whose LOAD_ signals are treated in the same way are clocked from SYS_CLK, and so don't care about how slowly the signal falls as long as it does it outside of a word cycle; the test code on the Nucleo was inadvertently mimicking this behaviour which meant it hadn't revealed the problem earlier.

With that sorted out, I tried running some test code again. Since the machine has no I/O yet, you'll have to take my word for this, but a short loop (adapted from an earlier test) running with no external jiggery-pokery successfully filled a chunk of memory with the Fibonacci sequence, to be read out via the front panel indicators. Sounds like a milestone to me!

To be celebrated in the traditional style: a triumphant tweet and a beer...