RFLONG being used after a hubexec call into cogexec to read parameters from hubram reminded me of the MLI interface for ProDOS on the Apple II: A jump to the MLI entry point was followed by a command byte and a pointer to a parameter list. This was then followed by a conditional branch to an error handling routine. On exit from the OS routine the carry indicated an error and the accumulator held the result or error code. Some standard stack pointer manipulation occurred within the MLI code to skip past the command block.
For the P2, dedicating one long after the call, the high byte could hold a command and the rest could hold a pointer that is autopadded to an allocated area of hubram. Alternatively it could be a word for each. That seems like a tool creator decision.
This is untested but I'm guessing it would look like this (Four RFLONGs in a row):
rflong x 'x = $x_data
rflong y 'y = $y_data
long $x_data
rflong v 'v = $v_data
long $y_data
rflong w 'w = $w_data
long $v_data
instr5
long $w_data
instr6
@evanh said:
This is untested but I'm guessing it would look like this (Four RFLONGs in a row):
rflong x 'x = $x_data
rflong y 'y = $y_data
long $x_data
rflong v 'v = $v_data
long $y_data
rflong w 'w = $w_data
long $v_data
instr5
long $w_data
instr6
Evan,
I've tested two, three and four RFLONG versions of your code and they all work, well done!
RFLONG is not any faster but and its data are in longs (not part of instructions) and could be written more easily, maybe by another cog. RFLONG is much faster than RDLONG. However RFLONG in hub exec does affect PC (the LONG does not increment PC by 4).
Actually, the RFLONG should be faster because it's 2 cycles and steals the extra long out of the pipeline, whereas AUGS+MOV takes 4 cycles. However, in many cases you could use either LOC or use AUGS directly into a relevant instruction, which are both the same speed and don't rely on odd impl. details like RFLONG does.
@Wuerfel_21 said:
Actually, the RFLONG should be faster because it's 2 cycles and steals the extra long out of the pipeline, whereas AUGS+MOV takes 4 cycles.
Thanks, you're right, RFLONG is faster. I knew it too, but I was in a terrible hurry when I wrote my last post.
Doh! Sorry,yes, immediate is valid path. Dunno why I thought it wasn't immediate data. Too tired I guess. I was trying stay off the computer and get packed up. Edit: In fact I was meant to be on the road when I wrote that post.
@Rayman said:
Augs would also mean spreading a 32 bit value over two different instructions, right?
Looks like 23 in augs and 9 in mov. Not so convenient...
The compiler already accommodates in ## syntax. No effort for source programming.
@Rayman said:
Augs would also mean spreading a 32 bit value over two different instructions, right?
Looks like 23 in augs and 9 in mov. Not so convenient...
I agree, also LOC can handle only 20 bits.
I've added code for two to four RFLONGs in a row to first post:
@evanh said:
@Rayman said:
Augs would also mean spreading a 32 bit value over two different instructions, right?
Looks like 23 in augs and 9 in mov. Not so convenient...
The compiler already accommodates in ## syntax. No effort for source programming.
But having 32 bits of data entirely in one long is better and that's what RFLONG enables.
I've copied your code for successive RFLONGs to the first post on page 1 (which I can't post a link to for some reason). I changed x,y,v,w to a,b,c,d. The first two RFLONGs are consecutive, then comes the first LONG, thereafter each LONG is separated by one instruction. Merry Xmas all!
Comments
RFLONG being used after a hubexec call into cogexec to read parameters from hubram reminded me of the MLI interface for ProDOS on the Apple II: A jump to the MLI entry point was followed by a command byte and a pointer to a parameter list. This was then followed by a conditional branch to an error handling routine. On exit from the OS routine the carry indicated an error and the accumulator held the result or error code. Some standard stack pointer manipulation occurred within the MLI code to skip past the command block.
https://prodos8.com/docs/techref/calls-to-the-mli/
For the P2, dedicating one long after the call, the high byte could hold a command and the rest could hold a pointer that is autopadded to an allocated area of hubram. Alternatively it could be a word for each. That seems like a tool creator decision.
Tried two rflong in a row?
Seems like might work?
This is untested but I'm guessing it would look like this (Four RFLONGs in a row):
rflong x 'x = $x_data rflong y 'y = $y_data long $x_data rflong v 'v = $v_data long $y_data rflong w 'w = $w_data long $v_data instr5 long $w_data instr6Evan,
I've tested two, three and four RFLONG versions of your code and they all work, well done!
But, how is this an advantage over:
mov x,##longvalue
mov y,##longvalue
mov v,##longvalue
mov w,##longvalue
Is it faster?
RFLONG is not any faster but and its data are in longs (not part of instructions) and could be written more easily, maybe by another cog. RFLONG is much faster than RDLONG. However RFLONG in hub exec does affect PC (the LONG does not increment PC by 4).
Actually, the RFLONG should be faster because it's 2 cycles and steals the extra long out of the pipeline, whereas AUGS+MOV takes 4 cycles. However, in many cases you could use either LOC or use AUGS directly into a relevant instruction, which are both the same speed and don't rely on odd impl. details like RFLONG does.
Thanks, you're right, RFLONG is faster.
I knew it too, but I was in a terrible hurry when I wrote my last post.
/me slaps everyone! MOV can't do hubRAM data fetches at all.
In hub exec MOV reads immediate source data from hub RAM.
I thought I should actually test that RFLONG + ... + LONG take only 2 cycles and they do (2 for RFLONG + 0 for LONG).
Augs would also mean spreading a 32 bit value over two different instructions, right?
Looks like 23 in augs and 9 in mov. Not so convenient...
Doh! Sorry,yes, immediate is valid path. Dunno why I thought it wasn't immediate data. Too tired I guess. I was trying stay off the computer and get packed up. Edit: In fact I was meant to be on the road when I wrote that post.
The compiler already accommodates in ## syntax. No effort for source programming.
Cool, so small speed advantage.
I agree, also LOC can handle only 20 bits.
I've added code for two to four RFLONGs in a row to first post:
But having 32 bits of data entirely in one long is better and that's what RFLONG enables.
I've copied your code for successive RFLONGs to the first post on page 1 (which I can't post a link to for some reason). I changed x,y,v,w to a,b,c,d. The first two RFLONGs are consecutive, then comes the first LONG, thereafter each LONG is separated by one instruction. Merry Xmas all!