cpu design

Vertex78 · 2008-02-16 20:58

I've been reading up on how a cpu works on the inside and I am having trouble realizing how you would implement the sequencing. The only way I can think of to do it would be to have a clock signal go into a counter, then have the counter hooked up to a decoder. So if you have a 3 to 8 decoder then that would give you 8 lines that sequentially go high and then you could use those to control the different steps on accessing an instruction, decoding.... But then that design would take 7 clock cycles for all of the decoder lines to get a used, causing each instruction to take 7 clock cycles. So how would a design be setup for instance like on the sx chip, where some instructions take 1 clock cycle.

Mike Green · 2008-02-16 21:57

There are several ways this can be done.

1) All of the work can be done in parallel so that it really happens in one clock cycles ... rarely done.

2) Pipelining can be done where several instructions are in different stages of processing at the same time. The net rate is one instruction per clock cycle, but any given instruction may take several clock periods to process. This is commonly done where the decoding of an instruction including perhaps the processing of index registers is done at the same time as the processing of the previous instruction.

3) The clock can be multiplied on-chip so the internal clocking of the cpu is several times faster than the apparent / external clock. This can also be done with a delay line so that the clock pulse can be offset by some fixed amount(s) of time and used to provide different phases. Sometimes the positive and negative edges of the clock pulse are used to provide two different phases for processing.

4) The current Propeller for example takes 4 (~12.5ns) clock cycles to process an instruction and the next instruction is fetched before the result of the current instruction is stored. That's pipelining.

LoopyByteloose · 2008-02-17 12:04

Take a look at the SX documentation to figure out one set of pipeline and counter features.

Both the SXes and a single COG of the Propeller are good study models. A lot of the other RISC micros have dedicated hardware in addition to the basic micro. All that can be distracting. Study of the SXes Virtual Peripherals code really can provide you with a heck of a lot of insight into how other's impliment such features in hardware.

An·advantage of looking at the Propeller COG is the memory is flat and only 512 32bit words in pure RAM. But of course it is 32 bits and that might be a distraction if your wanting to think only in bytes.

The SXes are presumed by many to be only·8-bit, but the machine code and memory actually uses 12bits in counters. For the SX28, one bit has been rendered redundant due to only have the EEPROM and half the RAM. There are pages of 512 12bit word program code in the EEPROM and series of·RAM banks of 16 8-bit bytes.

For assembly language programing, the Propeller's COG may actually be easier to follow because the program counter as it doesn't have the pages to mess with. And there isn't a second system of RAM banks.

I don't think you find anything simpler for study purposes unless you go back in time to 4 bit microcontrollers.

The program counter does not merely march on forever.· Loops or jumps are formed by reseting the counter and 'sub-routines' push and pop the counter's value·onto a stack. The SX has an 8 item stack. I'm not sure what is in the Propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
PLEASE CONSIDER the following:

Do you want a quickly operational black box solution or the knowledge included therein?······

···················· Tropically,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan

Post Edited (Kramer) : 2/17/2008 12:17:29 PM GMT

stevenmess2004 · 2008-02-18 12:08

The propeller doesn't have a stack unless the user implements one. This is one of the main reasons that C is hard to port to it.

The cogs in the propeller arn't a bad place to start.

There are actually only 32 different instructions and two of these arn't implemented yet so there are really only 30. The PropTool and the way the propeller works allow it to pull some smart tricks so that it appears to have many more instructions than this.

The main thing is that every instruction can be executed conditionally and the different results can be written conditionally. This is similar to some of the ARM chips and allows for some very neat tricks.

Another thing that is nice is that all the instructions have the same format. On the AVR chips many of the instructions have the different parts of the instructions in different places which can be really confusing if you are looking at the actual machine code.

Gadgetman · 2008-02-18 13:56

Don't forget that some CPUs handles the stack differently, or find other 'shortcuts'...

The Z80 family have not only the 'normal' set of registers(AF, BC, DE, HL, IX, IY, I, R, SP did I forget any?), but also a 'mirror set' (denoted AF', BC' and so on)
To save the main registers (AF - HL), just use the EXX instruction which flips a bit internally, enabling the mirrored set instead of the main set(or back again)

That takes 4 clock cycles. Doing 4 PUSH operations takes 40 cycles(I believe)
Afterwards, another EXX and you're back and running with the main set.
Very handy for a tight bit of code(say, an interrupt handler)

How interrupts are handled is also something worth looking into...
And the Z80 again goes nuts...
In one mode, it'll wait for the Interrupting unit to put a Byte onto the Data bus, then combine it with the 8bit I-register to make a 16bit address.
Then it'll fetch a 16bit address from the location pointed to and branch.
That means it is possible for it to handle 128 different interrupts(not at the same time, though)...
Never seen anything actually use that mode, though...
The R('Refresh') register is used for refresh of Dynamic RAM during the 'Decode' tick after instruction-fetch, and is a 'free-running' 7bit register.
(Nice for RND Generator. This is sometimes listed as an 8bit register, but that's just sloppy writing)

There are CPUs with more than one DATA-bus, too. One for Executable code and one for Stack.

Bit-slice CPUs?

Note that on a 'normal' CPU, the MOV/LD, JP, PUSH, POP instructions are all the same, but with a few fancy bells and whistles added for show.
(MOV / LD is different name for the same beast, JP/JUMP is LD/MOV something to the PC register, PUSH/POP is MOV to/from memory with an implied Inc/Dec of the SP)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

cpu design

Comments