RFLONG works in hub exec

TonyB_ · 2025-12-19 12:41

I have tested RFLONG in hub exec and it works, contrary to what the documentation says. It removes the long after the next instruction from the FIFO. Possible uses for this are (a) to read long data embedded within hub exec code, maybe put there by another cog and (b) to skip an instruction without any time penalty.

Examples of (a) with one to four RFLONGs:

      rflong a      'a = $a_data
      instr2
      long  $a_data

      rflong a      'a = $a_data
      rflong b      'b = $b_data
      long  $a_data
      instr3
      long  $b_data

      rflong a      'a = $a_data
      rflong b      'b = $b_data
      long  $a_data
      rflong c      'c = $c_data
      long  $b_data
      instr4
      long  $c_data

      rflong a      'a = $a_data
      rflong b      'b = $b_data
      long  $a_data
      rflong c      'c = $c_data
      long  $b_data
      rflong d      'd = $d_data
      long  $c_data
      instr5
      long  $d_data

instrN is Nth instruction executed in the code snippet and can be anything except RFLONG.

The main advantage of using RFLONG is that takes only two cycles, much faster than the 9...26 cycles (+1 if long crossing) for RDxxxx in hub exec mode. Note that if you wait sufficient time after a hub exec branch for FIFO filling to finish, RDxxxx can take the same as in cog exec, 9...16 cycles (+1 if long crossing).

An example of (b):

if_nz rflong temp
      instr2
_ret_ instr3
'return after executing instr3 if z
'continue if nz

Using RFLONG to skip an instruction is equivalent to SKIPF with fixed skip pattern of %10. Hub exec doesn't support SKIPF itself. RFLONG removes the skipped instruction from the FIFO before it enters the instruction pipeline so it wastes no cycles, unlike SKIP.

RFBYTE, RFWORD and RFVAR{S} did not work in hub exec mode in my tests. However, all RFxxxx can be used in cog exec after a routine in hub RAM returns. The data that RFxxxx reads start four bytes after the routine because the long directly after it has been read already as part of instruction pipelining. Note that a RDFAST is not needed before RFxxxx in this case because the former was done during the branch from cog to hub exec.

EDIT:
Added RFxxxx vs. RDxxxx time comparison
EDIT2:
Text corrected after further testing
EDIT3:
Added multiple examples of (a)

evanh · 2025-12-19 20:50

Thieving data makes sense. And I'd come to the conclusion the FIFO doesn't terminate. It just stops being used.

TonyB_ · 2025-12-19 23:29

@evanh said:
Thieving data makes sense. And I'd come to the conclusion the FIFO doesn't terminate. It just stops being used.

Yes and the good thing is that RFLONG doesn't interfere with an instruction fetch in hub exec.

To re-iterate, it's possible to do the following:
From cog/LUT exec, call a routine in hub RAM then after it returns do RFxxxx's to read data starting four bytes after the end of the routine, without ever needing a RDFAST.

Rayman · 2025-12-19 23:46

Hmm... I was also under the impression that one can't touch the fifo in hubexec mode...

evanh · 2025-12-20 00:06

@Rayman said:
Hmm... I was also under the impression that one can't touch the fifo in hubexec mode...

It will crash your program if not carefully crafted.

TonyB_ · 2025-12-20 00:22

deleted

rogloh · 2025-12-20 03:33

This looks potentially useful for some future things. Perhaps it could pass a fixed 32 bit argument to some common cogexec routine called via different hubexec codepaths saving a couple of cycles, or for a parameter / return address? to an execf code block in cog RAM from hubexec code. That could allow a nice way to "return" back to hubexec from conditionally called execf code sprinkled throughout your hubexec code. Eg to return back to your hubexec code you could do:

in hubRAM:

rflong retaddr  ' read return address before conditionally branching into cogexec code
if_z execf cogsvc ' conditionally do cog service
long $+4 ' address of next long in hub
<next_instr> ' next hubexec instruction after cogexec mode completes continues here

in cogRAM:

cogsvc  long  #%101001110 << 10 + cogexecsvc ' skipf pattern
retaddr long 0 ' place to store return address

cogexecsvc <svc_instr1>
<svc_instr2>
...
jmp retaddr ' continue with hub exec from where it was called

This probably does incur a penalty however when returning due to hub fifo refill. But if your COG RAM is full and you can't break apart the service routines and can conveniently pack them together with execf it could still be helpful.

Rayman · 2025-12-20 20:38

I'm trying to understand that first example...

So x and y are cog registers associated with the cog that is running the hubexec code, got that.
No "RDFAST" command is needed?

You just drop in a rflong x and then x gets loaded with hub data at PC+8?

What is with the:
<instr2> and <instr4>
.. are these actually executed?
Or, are you saying they are skipped?

evanh · 2025-12-20 21:21

The long $x_data and long $y_data are not executed because they are stolen out of the FIFO by each respective RFLONG. The FIFO is happy to give up its content to whichever path does the grabbing. It still behaves as a FIFO.

Applies to the streamer too: If the cog reads the FIFO between streamer reads then their respective data can be described as interleaved in hubRAM. The streamer could also coincide it reads, that would count as a single to the FIFO and I don't know if both would see the same data. Probably would.

evanh · 2025-12-20 21:27

@evanh said:
The long $x_data and long $y_data are not executed because they are stolen out of the FIFO by each respective RFLONG. The FIFO is happy to give up its content to whichever path does the grabbing. It still behaves as a FIFO.

This being reliable probably implies the two paths are clocked on alternate clocks. Which will possibly just be incidental in terms of FIFO. Chip won't have explicitly designed it for the FIFO that way.

Rayman · 2025-12-20 21:43

seems you can't use <instr2> and <instr4> in a post without marking it as code...
Fixed my post above just now...

evanh · 2025-12-20 21:51

@Rayman said:
seems you can't use <instr2> and <instr4> in a post without marking it as code...
Fixed my post above just now...

Correct, hubexec is operating the whole time. It always takes from the FIFO on every second clock cycle. The FIFO will in turn read from hubRAM at a faster rate than regular hubexec because the RFLONGs are consuming extra longwords from the FIFO on the alternate clock cycles.

There's definitely extra demands that could have gone crappy but don't.

TonyB_ · 2025-12-20 22:32

@Rayman said:
I'm trying to understand that first example...

So x and y are cog registers associated with the cog that is running the hubexec code, got that.
No "RDFAST" command is needed?

You just drop in a rflong x and then x gets loaded with hub data at PC+8?

What is with the:
<instr2> and <instr4>
.. are these actually executed?
Or, are you saying they are skipped?

<instr2> and <instr4> are the 2nd and 4th instructions executed in the code example and are not affected by RFLONG. I labelled them that way to diffentiate instructions from data.

EDIT:
Changed <instr2> and <instr4> to instr2 and instr4 in first post.

TonyB_ · 2025-12-20 22:40

@evanh said:

@evanh said:
The long $x_data and long $y_data are not executed because they are stolen out of the FIFO by each respective RFLONG. The FIFO is happy to give up its content to whichever path does the grabbing. It still behaves as a FIFO.

This being reliable probably implies the two paths are clocked on alternate clocks. Which will possibly just be incidental in terms of FIFO. Chip won't have explicitly designed it for the FIFO that way.

RFxxxx RFLONG in hub exec should be tested by others to confirm its 100% reliable, or not.

TonyB_ · 2025-12-21 00:05

After further testing, RFLONG works in hub exec but other RFxxxx do NOT. Apologies for incorrect info.

In previous testing, all RFxxxx work in cog exec after calling a routine in hub RAM, as described above.

rogloh · 2025-12-21 00:08

@TonyB_ said:
After further testing, RFLONG works in hub exec but other RFxxxx do NOT. Apologies for incorrect info.

In previous testing, all RFxxxx work in cog exec after calling a routine in hub RAM, as described above.

That's still useful. Makes sense given P2 instruction alignment.

TonyB_ · 2025-12-21 00:16

@rogloh said:

@TonyB_ said:
After further testing, RFLONG works in hub exec but other RFxxxx do NOT. Apologies for incorrect info.

In previous testing, all RFxxxx work in cog exec after calling a routine in hub RAM, as described above.

That's still useful. Makes sense given P2 instruction alignment.

To be honest RFBYTE, RFWORD & RFVAR{S} were inviting trouble so no real loss.

AJL · 2025-12-21 00:45

What interaction does this ingenious approach have with the program counter and therefore relative addressing?

It would be helpful to know this for any documentation updates and for flagging this as an ‘advanced’ coding technique that may need special handling or avoidance of mixing the techniques.

AJL · 2025-12-21 00:54

I’ll also note that use case (b) seems limited in applicability, as in many cases you can just use the opposite conditional execution flag on the instruction to be skipped and a following return instruction. Saving 2 cycles and one long might not be worth it for the reduction in comprehensibility of the code.

TonyB_ · 2025-12-21 01:35

@AJL said:
What interaction does this ingenious approach have with the program counter and therefore relative addressing?

It would be helpful to know this for any documentation updates and for flagging this as an ‘advanced’ coding technique that may need special handling or avoidance of mixing the techniques.

That is a very good question and after more testing I can tell you the answer:

A long that is read from the FIFO by RFLONG in hub exec never enters the instruction pipeline and PC is not incremented by four. This affects relative addressing as PC for every instruction following the read long does not match the actual address in hub RAM until there is a branch to an absolute address.

TonyB_ · 2025-12-21 01:46

@AJL said:
I’ll also note that use case (b) seems limited in applicability, as in many cases you can just use the opposite conditional execution flag on the instruction to be skipped and a following return instruction. Saving 2 cycles and one long might not be worth it for the reduction in comprehensibility of the code.

Case (b) deserves a mention at the very least because this is how I discovered what RFLONG in hub exec does. I put RFLONG two instructions before a block move SETQ #511 and found that only one long was moved, not 512.

rogloh · 2025-12-21 09:12

I wonder if this finding can somehow be used as an escape for a branch to some external memory address while running in HUB exec mode - sort of like a "far" jump.... obviously that needs toolchain support to generate the correct code but it might be a good way to conditionally branch/call code in external (non-hub) memory, potentially via some intermediate COG exec code. I've done some work a while back on this idea with code caching from PSRAM and was even able to get MicroPython to run from external RAM, it might be time to reexamine that now too with this new capability found.

TonyB_ · 2025-12-21 12:16

More testing:
(1) Three consecutive RFLONG + NOP + LONG in hub exec, followed by CALL to LUT RAM routine that pops return address and does GETPTR.
(2) Nine NOPs instead of 3 x RFLONG + NOP + LONG, otherwise same as (1).

Results:
Return address for (2) = address of instruction after CALL
Return address for (1) = address of instruction after CALL - 12
GETPTR for (1) and (2) = address of second instruction after CALL

evanh · 2025-12-21 14:44

Hmm, getting convoluted having to switch to using GETPTR and offset ... or compiler could add a calculated offset to the stacked address. Both have overheads. Relative branching is broken either way. Puts a damper on uses of this hack.

TonyB_ · 2025-12-21 15:34

@evanh said:
Hmm, getting convoluted having to switch to using GETPTR and offset ... or compiler could add a calculated offset to the stacked address. Both have overheads. Relative branching is broken either way. Puts a damper on uses of this hack.

CALL maybe best avoided after RFLONG. Relative jumps within the code block after last LONG will work unchanged. Relative jumps to elsewhere will work if +4 added to branch address for each RFLONG in source code.

mwroberts · 2025-12-21 15:59

After returning from hub execution to cog execution, is the FIFO shut down immediately, of would a rflong from cog PASM get the next addressed hub long after the RET?

These FIFO values are static once in the pipeline? A write to a hub address from a second process won't change the a FIFO value about to be read from the first process?

TonyB_ · 2025-12-21 17:45

@mwroberts said:
After returning from hub execution to cog execution, is the FIFO shut down immediately, of would a rflong from cog PASM get the next addressed hub long after the RET?

These FIFO values are static once in the pipeline? A write to a hub address from a second process won't change the a FIFO value about to be read from the first process?

FIFO keeps working after a hub exec returns to cog exec. All RFxxxx work in cog exec and the first RFxxxx reads data starting at four bytes after the end of the hub exec routine. The next long (zero bytes) after the routine is read and put in the instruction pipeline by hub exec but never executed due to the return.

After a long has been read from hub RAM address xxxxx into the FIFO, a subsequent write to xxxxx will not change what is in the FIFO.

evanh · 2025-12-21 21:08

@TonyB_ said:
... Relative jumps to elsewhere will work if +4 added to branch address for each RFLONG in source code.

Good point, it doesn't cost anything in the compiled code. So, to avoid the stack call errors, best kept to leaf functions then.

TonyB_ · 2025-12-21 22:03

@evanh said:

@TonyB_ said:
... Relative jumps to elsewhere will work if +4 added to branch address for each RFLONG in source code.

Good point, it doesn't cost anything in the compiled code. So, to avoid the stack call errors, best kept to leaf functions then.

I've made similar adjustments to addresses during skipping:

'N.B. if N skipped instructions directly
'after callpx D,#S then subtract N from #S

callpb  pa,#readb_dest_rm_src_reg - 6   '||b|||r|||||
callpb  pa,#readb_dest_reg_src_rm - 5   '||||c|||||||
callpb  pa,#readw_dest_rm_src_reg - 5   '|||B|||R||||
callpb  pa,#readw_dest_reg_src_rm - 4   '|||||C||||||

First mentioned here some time ago and another thing the doc says can't be done!

evanh · 2025-12-21 23:09

It's the stacked return address that gets messed up here though. Best avoided altogether, hence leafing only. For clarity, I should have made the two separate statements as two paragraphs in the previous post.

cgracey · 2025-12-22 04:12

TonyB_, this is a really interesting discovery!

I am wondering how it could be used to consolidate the Spin2 interpreter code. Maybe returning from Hub code to Cog code, I could optionally do an RFLONG in some cases to pull special data. Interesting possibilities exist.

RFLONG works in hub exec

Comments