RFLONG works in hub exec
I have tested RFLONG in hub exec and it works, contrary to what the documentation says. It removes the long after the next instruction from the FIFO. Possible uses for this are (a) to read long data embedded within hub exec code, maybe put there by another cog and (b) to skip an instruction without any time penalty.
An example of (a):
rflong x 'x = $x_data
instr2
long $x_data
rflong y 'y = $y_data
instr4
long $y_data
instr2 and instr4 are the 2nd and 4th instructions executed in this code snippet. They can be any instruction except RFLONG.
The main advantage of using RFLONG is that takes only two cycles, much faster than the 9...26 cycles (+1 if long crossing) for RDxxxx in hub exec mode. Note that if you wait sufficient time after a hub exec branch for FIFO filling to finish, RDxxxx can take the same as in cog exec, 9...16 cycles (+1 if long crossing).
An example of (b):
if_nz rflong temp
instr2
_ret_ instr3
'return after executing instr3 if z
'continue if nz
Using RFLONG to skip an instruction is equivalent to SKIPF with fixed skip pattern of %10. Hub exec doesn't support SKIPF itself. RFLONG removes the skipped instruction from the FIFO before it enters the instruction pipeline so it wastes no cycles, unlike SKIP.
RFBYTE, RFWORD and RFVAR{S} did not work in hub exec mode in my tests. Consecutive RFLONGs also did not work. However, all RFxxxx can be used in cog exec after a routine in hub RAM returns. The data that RFxxxx reads start four bytes after the routine because the long directly after it has been read already as part of instruction pipelining. Note that a RDFAST is not needed before RFxxxx in this case because the former was done during the branch from cog to hub exec.
EDIT:
Added RFxxxx vs. RDxxxx time comparison.
EDIT2:
Text corrected after further testing.

Comments
Thieving data makes sense. And I'd come to the conclusion the FIFO doesn't terminate. It just stops being used.
Yes and the good thing is that RFLONG doesn't interfere with an instruction fetch in hub exec.
To re-iterate, it's possible to do the following:
From cog/LUT exec, call a routine in hub RAM then after it returns do RFxxxx's to read data starting four bytes after the end of the routine, without ever needing a RDFAST.
Hmm... I was also under the impression that one can't touch the fifo in hubexec mode...
It will crash your program if not carefully crafted.
deleted
This looks potentially useful for some future things. Perhaps it could pass a fixed 32 bit argument to some common cogexec routine called via different hubexec codepaths saving a couple of cycles, or for a parameter / return address? to an execf code block in cog RAM from hubexec code. That could allow a nice way to "return" back to hubexec from conditionally called execf code sprinkled throughout your hubexec code. Eg to return back to your hubexec code you could do:
in hubRAM:
in cogRAM:
This probably does incur a penalty however when returning due to hub fifo refill. But if your COG RAM is full and you can't break apart the service routines and can conveniently pack them together with execf it could still be helpful.
I'm trying to understand that first example...
So x and y are cog registers associated with the cog that is running the hubexec code, got that.
No "RDFAST" command is needed?
You just drop in a rflong x and then x gets loaded with hub data at PC+8?
What is with the:
<instr2> and <instr4>.. are these actually executed?
Or, are you saying they are skipped?
The
long $x_dataandlong $y_dataare not executed because they are stolen out of the FIFO by each respective RFLONG. The FIFO is happy to give up its content to whichever path does the grabbing. It still behaves as a FIFO.Applies to the streamer too: If the cog reads the FIFO between streamer reads then their respective data can be described as interleaved in hubRAM. The streamer could also coincide it reads, that would count as a single to the FIFO and I don't know if both would see the same data. Probably would.
This being reliable probably implies the two paths are clocked on alternate clocks. Which will possibly just be incidental in terms of FIFO. Chip won't have explicitly designed it for the FIFO that way.
seems you can't use
<instr2> and <instr4>in a post without marking it as code...Fixed my post above just now...
Correct, hubexec is operating the whole time. It always takes from the FIFO on every second clock cycle. The FIFO will in turn read from hubRAM at a faster rate than regular hubexec because the RFLONGs are consuming extra longwords from the FIFO on the alternate clock cycles.
There's definitely extra demands that could have gone crappy but don't.
<instr2>and<instr4>are the 2nd and 4th instructions executed in the code example and are not affected by RFLONG. I labelled them that way to diffentiate instructions from data.EDIT:
Changed
<instr2>and<instr4>toinstr2andinstr4in first post.RFxxxx RFLONG in hub exec should be tested by others to confirm its 100% reliable, or not.
After further testing, RFLONG works in hub exec but other RFxxxx do NOT. Apologies for incorrect info.
In previous testing, all RFxxxx work in cog exec after calling a routine in hub RAM, as described above.
That's still useful. Makes sense given P2 instruction alignment.
To be honest RFBYTE, RFWORD & RFVAR{S} were inviting trouble so no real loss.
What interaction does this ingenious approach have with the program counter and therefore relative addressing?
It would be helpful to know this for any documentation updates and for flagging this as an ‘advanced’ coding technique that may need special handling or avoidance of mixing the techniques.
I’ll also note that use case (b) seems limited in applicability, as in many cases you can just use the opposite conditional execution flag on the instruction to be skipped and a following return instruction. Saving 2 cycles and one long might not be worth it for the reduction in comprehensibility of the code.
That is very good question and after more testing I can tell you the answer:
A long that is read from the FIFO by RFLONG in hub exec never enters the instruction pipeline and PC is not incremented by four. This affects relative addressing as PC for instructions following the read long does not match the actual address in hub RAM until there is a branch to an absolute address.