CALL #{\}A Syntax Qestion

Bob Drury · 2021-09-30 23:30

CALL #{}A Call to A by pushing {C, Z, 10'b0, PC[19:0]} onto stack. If R = 1 then PC += A, else PC = A. "\" forces R = 0.

1) Where is the stack for CALL/RET if it is 8 registers is it apart of the cog CPU with no access other than insturctions CALL and RET?

2) In {C, Z, 10'b0, PC[19:0]} the C,Z and PC[19:0] make sense but what is 10'b0?

3) What is R? Wouldn't the return (RET) address always be the next address after the CALL. IF Call
address $045 return address $46 . PC = PC +A = 2A . 0<PC<511 how does the Rap around work?

4) CALL #iotd would push stuff on stack somehow then load PC with address iotd then on next clock
execute the instruction at iotd. What does Call #{}A mean? I interpret this as CALL #\A. Forcing what
R is to zero. Somehing else must set R.

Help would be apreciated
Regards
Bob (WRD)

Ariba · 2021-10-01 00:15

1) Yes, it's an 8 level stack apart from the cogram. There are other instructions that can use this stack, like PUSH amd POP.

2) 10'b0 is Verilog syntax for 10 zero bits. They just expand the the PC and the flags to 32 bits.

3) R switches between Relative and Absolute addressing for the call. This has nothing to do with the return address. The address range is not limited to 0..511, you can jump to LUTexec ($200..$3FF) and HUBexec (>$400).

4) #\LABEL is an absolute jump, #LABEL is a relative jump. So R is set if you don't write a \ .

Andy

Bob Drury · 2021-10-01 02:48

Andy. Thanks for reply. I have a couple more questions I appreciate the help

1) Is there an existing write up detailing the CALL statements and the stack?
2) What is the address of CALL\RET stack?
3) When the CALL is made I assume 1 long of the stack would be used for {C, Z, 10'b0, PC[19:0]} which is 32 bits.
and the the other 7 registers would be able for program usage. Does the Call load the low value or the high value?

Regards Thanks for information.
Bob (WRD)

rogloh · 2021-10-01 04:54

For questions 2) & 3) this 8 level stack is not accessible at any address, it is just an 8 entry circular FIFO internal to the COG HW. You can only PUSH and POP register data from it and also do CALL or RET using it. PUSH and CALL write a new 32 bit data value to this stack and POP or RET remove the value. That's all you can do with it. If you overflow or underflow it will wrap around. It is not addressable.

Cluso99 · 2021-10-01 05:49

@rogloh said:
For questions 2) & 3) this 8 level stack is not accessible at any address, it is just an 8 entry circular FIFO internal to the COG HW. You can only PUSH and POP register data from it and also do CALL or RET using it. PUSH and CALL write a new 32 bit data value to this stack and POP or RET remove the value. That's all you can do with it. If you overflow or underflow it will wrap around. It is not addressable.

Effectively, it is not wrap around.
If you push an extra it will just drop the bottom as it pushes down. Same goes for popping. As you pop one off, they all move up the fifo, and the bottom entry is copied up, so when you pop them all off, the stack will contain all 8 identical values, equal to the final one popped.
The hardware is a bit different in that it has a pointer to the top of stack into the 8 internal registers, but that is irrelevant to it's perceived operation.

rogloh · 2021-10-01 07:25

Interesting. So basically it appears as if the data itself moves up and down in the queue, but the "pointer" being used is essentially fixed at the top.

Cluso99 · 2021-10-01 07:57

Yes

TonyB_ · 2021-10-01 09:48

@rogloh said:

@Cluso99 said:

@rogloh said:
For questions 2) & 3) this 8 level stack is not accessible at any address, it is just an 8 entry circular FIFO internal to the COG HW. You can only PUSH and POP register data from it and also do CALL or RET using it. PUSH and CALL write a new 32 bit data value to this stack and POP or RET remove the value. That's all you can do with it. If you overflow or underflow it will wrap around. It is not addressable.

Effectively, it is not wrap around.
If you push an extra it will just drop the bottom as it pushes down. Same goes for popping. As you pop one off, they all move up the fifo, and the bottom entry is copied up, so when you pop them all off, the stack will contain all 8 identical values, equal to the final one popped.
The hardware is a bit different in that it has a pointer to the top of stack into the 8 internal registers, but that is irrelevant to it's perceived operation.

Interesting. So basically it appears as if the data itself moves up and down in the queue, but the "pointer" being used is essentially fixed at the top.

The hardware stack has no stack pointer and is implemented as a big (8*32=256-bit) shift register [I've seen the Verilog]. All eight longs are shifted by PUSH/CALL and seven longs are shifted by POP/RET, as Cluso described.

Bob Drury · 2021-10-01 15:14

Here is a summary of the CALL discussion as I understand it. If something is wrong please let me know.
Regards
Bob (WRD)

evanh · 2021-10-01 19:34

Better title - "Calling Routines, And The Hardware Stack". PUSH and POP aren't exclusively for calling purposes. For example, parameters for the routine can be placed on the stack using PUSH. Admittedly, with its limited space, don't really want to be doing that on the hardware stack. On that note, I can see this document being expanded to describe all calling instructions.

There is multiple call instructions that use the hardware stack - There is CALLPA/CALLPB (Not to be confused with CALLA/CALLB) and even two versions of the identically named CALL: The regular CALL has register direct and 9-bit immediate modes, whereas the second version is only 20-bit immediate mode. Assemblers automatically select which one suits ... (Huh, just noticed there is also two versions of CALLD with same effect. I hadn't noticed that before.)

Unlike the regular CALL/CALLA/CALLB, which are single operand instructions, CALLPA/CALLPB are double operand instructions. The extra is for specifying a parameter that gets passed into the routine by copying it to register PA or PB accordingly.

EDIT: Just discovered there is no 9-bit immediate mode. Makes sense since the 20-bit immediate version has same features. Which makes the 20-bit immediate version the regular one. Anyway, there's a total of four call instructions that all use the hardware stack ... And six that don't.

Bob Drury · 2021-10-02 00:01

Thanks for Feedback. I have created a seperate section for sample CALL Instructions . I used Name " 12.9) Calling SubRoutines".
Just finished CALL #{}A and CALL D. It is going to take a little time to go through all CALLS and create examples.

Regards
Bob (WRD)

evanh · 2021-10-02 03:15

Oh, CALL #label (without the \) isn't always a relative branch. When crossing area boundaries, eg: hubRAM to cogRAM, the assembler will automatically make it an absolute address.

It actually is possible to use relative branching to cross area boundaries but, due to the different address granularities used between those areas, relative branching from hubRAM back to either cogRAM or lutRAM can only land on every fourth address. Therefore only absolute branching is generated by assemblers and compilers alike.

PS: Relative branching between areas is not just hypothesised, it has been tested with hand encoded machine code.

evanh · 2021-10-02 03:33

That difference in address granularity has been the bane of many an architect throughout the history of computers. Should the CPU be optimised for its native word size or should it support sub-word sizes like "byte" at the cost of extra address bits and inline data muxes?

Chip may not have resolved the dilemma with instruction fetching but I think he did a bang up job of it with the ALTxx instructions for indexing sub-word sizes within cogRAM. Namely, each data size is indexed at its natural granularity relative to a base address.

EDIT: Reworded for clarity.

CALL #{\}A Syntax Qestion

Comments