Not really. It just seems like a relatively minor feature that could be "hidden" for discovery latter. Just thinking about writing code that can reverse (like UNO!) is presenting all sorts of fun little pieces of code.
Where is this COG ROM located in the 512 long address space and how much space does it take?
It currently banks into $000..$007. By the time we add all the features outlined, it will probably grow to over 16 longs. The code for RDWIDEx would probably take an extra 10 longs.
COGNEW D, S/#
--------------
COGNEW starts the lowest-numbered idle cog.
For COGNEW, D specifies a long address in hub memory that is the start of the program that is to be
loaded into the idle cog, while S is a 18-bit parameter (usually an address) that will be conveyed to
PTRA of that cog. PTRB of that cog will be set to the start address of its new program in hub memory,
which is the same as the D value used in the COGNEW instruction, AND'd with $3FFFC to form a hub long
address.
COGNEW will return the number of the started cog (0..7) into D, with C=0 indicating success or C=1
indicating failure, in which case no cog was idle and so D is invalid.
If D.bit31 is 0, cog starts in hubexec mode at hub location pointed to by D, the hubexec code can load the cog however it wants.
if D.bit31 is 1:
1xxxxxxCCCCCCCCChhhhhhhhhhhh
CCCCCCCCC is number of longs to load into the cog, regular cog execution, starts at 0
xxxxxx is currently undefined, it could be used as an offset to cog load start address
hhhhhhhhhhhh is the hub address to load from
This also means there would not be any need for the small per-cog rom
Not really. It just seems like a relatively minor feature that could be "hidden" for discovery latter. Just thinking about writing code that can reverse (like UNO!) is presenting all sorts of fun little pieces of code.
Maybe like 'synthetic' programming on the HP-41C. That's the comparison that comes to my mind.
I think you would use RDWIDEx instead. This would start a sustained hub read that you would then transfer with a REPS/MOV combination. Based on recent posts from Chip, I think this approach would be full speed (no stalls). See [POST=1243817]this post[/POST] for an example of the code.
But to utilize every hub slot the cog would need to do a RDWIDE followed by 8 MOV instructions within 8 cycles, which isn't possible. At best, it seems like we could only do a RDWIDE every other hub slot, or 16 cycles. Is that correct, or am I missing something? If so, then just doing a RDLONGC with a REPS would be just as fast.
Sorry if I'm missing something obvious here. I haven't been following the last 3 or 4 thousand posts very closely.
But to utilize every hub slot the cog would need to do a RDWIDE followed by 8 MOV instructions within 8 cycles, which isn't possible. At best, it seems like we could only do a RDWIDE every other hub slot, or 16 cycles. Is that correct, or am I missing something? If so, then just doing a RDLONGC with a REPS would be just as fast.
Sorry if I'm missing something obvious here. I haven't been following the last 3 or 4 thousand posts very closely.
There's a new full-speed means of reading and writing hub memory to/from cog RAM call RDWIDEA/RDWIDEB. Seairth had a link in his posting about it.
It currently banks into $000..$007. By the time we add all the features outlined, it will probably grow to over 16 longs. The code for RDWIDEx would probably take an extra 10 longs.
Are you saying it overlays those locations during load but then gets unmapped when the COG is started? So it doesn't actually take up any COG space?
Will the new COGNEW basically be a hub ram -> cog ram copier that jumps when it's done? Can the jump after the load be optional? Can I only jump one task and leave the rest running (supposing their ram doesn't get reloaded)? Can the load be backgrounded and/or only stop the task being reloaded? How about a instruction to dump cog ram to hub ram, or to swap them? This looks very exciting! It would be very useful for the microkernel I (and apparently others, like Ahle2) want to make.
This cog rom stuff scares and confuses me... What's it for? Why can't it be in hub rom and get loaded in with the fancy new COGNEW when needed?
Does RDWIDEx somehow somehow not have to wait for the cog's hub cycle or something?
Will the new COGNEW basically be a hub ram -> cog ram copier that jumps when it's done? Can the jump after the load be optional? Can I only jump one task and leave the rest running (supposing their ram doesn't get reloaded)? Can the load be backgrounded and/or only stop the task being reloaded? How about a instruction to dump cog ram to hub ram, or to swap them? This looks very exciting! It would be very useful for the microkernel I (and apparently others, like Ahle2) want to make.
This cog rom stuff scares and confuses me... What's it for? Why can't it be in hub rom and get loaded in with the fancy new COGNEW when needed?
Does RDWIDEx somehow somehow not have to wait for the cog's hub cycle or something?
electrodude
This cog ROM is nothing big, just a few instructions. It's there to make the loading process smarter, so that we can do background clearing, partial loads, and variable cog/hub jumps to start execution.
When a cog is loaded, it is completely reset. The cog RAM endures, but all the I/O registers are cleared. So, all tasks must be kicked off anew. Don't worry, though, there will be apt facilities for dealing with tasks, in order to make a multitasking OS.
RDWIDEx does the RDWIDEs in the background, so that instructions in the foreground can deal with the 100% duty cycle of long data that need to be stored.
Couldn't the ROM to start the cog be in hub ?
Can we have a bit to reset the other parts of the cog ? So we can start without resetting or reloading.
I would be happy with 2 START hub locations - One fast load using WIDEs where the length and cog address is on a wide boundary, and one more conventional slower load. I would expect this rom code would test for clearing cog and resetting thhe cog.Then it can jump to hub or cog to start. This then also provides the mechanisms we are after to perform routines already loaded.
When I get home tomorrow evening I can put a small hub boot code together to explain further.
That's a crazy idea! We'd have to change the return address computations for CALLs, too. Can you see any advantage to this?
When doing a sine wave generator. Direct sine wave digital synthesis using a DAC.
The four quadrants are symmetric so:
Step #1 Send data to the DAC for the first quadrant with a PC autoincrementing,
Step #2 Send data for the second quadrant with the reversing PC direction instruction.
Step #3 Send data to the DAC for the third quadrant adding minus sign to all the data.
(Is there an instruction to self modify data sign in a defined memory range?)
Step #4 Send data to the DAC for the fourth quadrant reversing PC direction instruction (with data already reversed in sign).
Step #5 Reverse sign again to the data and loop to #1
PS: I don't like step #5. It looks that it brokes symmetry perfection. Could someone make an improved (symetrical, only four steps, NO loop, good looking ;-) algorithm?
Been thinking... The default coginit should reset cog, fast load using wides from hub on wide boundary (yes, demand a boundary because everyone wants best performance) for $1F0 longs (62 wides) ie only loads cog $000-$1EF, then starts (jumps) at cog $000.
The calling parameters (hub load address, and parameter list) should also be copied to PTRA & PTRB.
Effectively this would be done by a jump to hub rom which would execute the above loading.
Been thinking... The default coginit should reset cog, fast load using wides from hub on wide boundary (yes, demand a boundary because everyone wants best performance) for $1F0 longs (62 wides) ie only loads cog $000-$1EF, then starts (jumps) at cog $000.
The calling parameters (hub load address, and parameter list) should also be copied to PTRA & PTRB.
Effectively this would be done by a jump to hub rom which would execute the above loading.
I agree that fast load should be an option. In those cases, the cog code in hub RAM will have to be wide-aligned, starting at a 32-byte boundary. There is a memory price to pay for that speed. I think in most applications, it wouldn't be worth it, especially when your program is not that big.
I changed the COGINIT/COGNEW instructions around, moving them to D-only opcodes. This way, two 16-bit addresses can be packed into D for PTRA (parameters) and PTRB (program). This means that parameters will start at a long-aligned address, like on Prop1. You can use AUGD with a cog starter instruction and launch statically-placed programs in what looks like one PASM instruction.
COGINIT has been renamed to COGRUN/COGRUNX, while COGNEW/COGNEWX start an idle cog. The -X suffix means start in hub memory, without loading any code into the cog. The normal versions that load the cog can use a prefix long to state how many longs to load, where to start loading inside the cog, where to jump to when done, and whether or not to pre-clear registers.
I changed the COGINIT/COGNEW instructions around, moving them to D-only opcodes. This way, two 16-bit addresses can be packed into D for PTRA (parameters) and PTRB (program). This means that parameters will start at a long-aligned address, like on Prop1. You can use AUGD with a cog starter instruction and launch statically-placed programs in what looks like one PASM instruction.
COGINIT has been renamed to COGRUN/COGRUNX, while COGNEW/COGNEWX start an idle cog. The -X suffix means start in hub memory, without loading any code into the cog. The normal versions that load the cog can use a prefix long to state how many longs to load, where to start loading inside the cog, where to jump to when done, and whether or not to pre-clear registers.
Fantastic Chip because with cognewx we can have a simple fast loader to load $0-$1E0 from a hub wide aligned address and the jmp to cog $0. The loader would only be about 10 instructions and reside in hub ram. How neat is this!
Do any of you guys ever use the IJNZ/IJNZD/IJZ/IJZD instructions? I'm looking for some opcode space and I am thinking that maybe nobody would miss these. The DJNZ-type instructions are way more useful, though. What do you say? Would you miss IJNZ-type instructions?
No worries. I'm here finally catching up with the latest build.
The DJNZ set makes a ton of sense, and we are all used to expressing things that way from P1. Incrementing isn't anything like the sweet and useful case decrementing is. The trade, whatever it is, is extremely highly likely to offer more benefit / opcode. IMHO, of course.
Do any of you guys ever use the IJNZ/IJNZD/IJZ/IJZD instructions? I'm looking for some opcode space and I am thinking that maybe nobody would miss these. The DJNZ-type instructions are way more useful, though. What do you say? Would you miss IJNZ-type instructions?
I've never used IJNZ or it's variants and can't think of a use for it in any of my stuff. I wouldn't miss it.
Is there an IJ instruction? This would be useful as a loop counter for tight loops (eg RX bits received), also for 2^n state machines for extracting the state from the modulus of the counter.
IJNZ would be the next best thing in this case, but perhaps there is another way to achieve the same?
Do any of you guys ever use the IJNZ/IJNZD/IJZ/IJZD instructions? I'm looking for some opcode space and I am thinking that maybe nobody would miss these. The DJNZ-type instructions are way more useful, though. What do you say? Would you miss IJNZ-type instructions?
DJNZ is widely used, on many micros, and naturally bounds small number scans at 0.
IJNZ bounds at the top (2^32-1), which is rather less directly useful, and if you want a incrementing loop, it can be done using
REPx and an INC ( I forget, did REPx get a counter per thread in the end ? )
Comments
PASM palindromes ?
It currently banks into $000..$007. By the time we add all the features outlined, it will probably grow to over 16 longs. The code for RDWIDEx would probably take an extra 10 longs.
From the latest docs:
If D.bit31 is 0, cog starts in hubexec mode at hub location pointed to by D, the hubexec code can load the cog however it wants.
if D.bit31 is 1:
1xxxxxxCCCCCCCCChhhhhhhhhhhh
CCCCCCCCC is number of longs to load into the cog, regular cog execution, starts at 0
xxxxxx is currently undefined, it could be used as an offset to cog load start address
hhhhhhhhhhhh is the hub address to load from
This also means there would not be any need for the small per-cog rom
Maybe like 'synthetic' programming on the HP-41C. That's the comparison that comes to my mind.
Sorry if I'm missing something obvious here. I haven't been following the last 3 or 4 thousand posts very closely.
There's a new full-speed means of reading and writing hub memory to/from cog RAM call RDWIDEA/RDWIDEB. Seairth had a link in his posting about it.
Only thing I'm are missing now ---- Are instruction to say to RoundRobin ---- from COG --- I'm NOT use HUB so that cycle can be used by others
This cog rom stuff scares and confuses me... What's it for? Why can't it be in hub rom and get loaded in with the fancy new COGNEW when needed?
Does RDWIDEx somehow somehow not have to wait for the cog's hub cycle or something?
electrodude
That's right.
This cog ROM is nothing big, just a few instructions. It's there to make the loading process smarter, so that we can do background clearing, partial loads, and variable cog/hub jumps to start execution.
When a cog is loaded, it is completely reset. The cog RAM endures, but all the I/O registers are cleared. So, all tasks must be kicked off anew. Don't worry, though, there will be apt facilities for dealing with tasks, in order to make a multitasking OS.
RDWIDEx does the RDWIDEs in the background, so that instructions in the foreground can deal with the 100% duty cycle of long data that need to be stored.
Couldn't the ROM to start the cog be in hub ?
Can we have a bit to reset the other parts of the cog ? So we can start without resetting or reloading.
I would be happy with 2 START hub locations - One fast load using WIDEs where the length and cog address is on a wide boundary, and one more conventional slower load. I would expect this rom code would test for clearing cog and resetting thhe cog.Then it can jump to hub or cog to start. This then also provides the mechanisms we are after to perform routines already loaded.
When I get home tomorrow evening I can put a small hub boot code together to explain further.
When doing a sine wave generator. Direct sine wave digital synthesis using a DAC.
The four quadrants are symmetric so:
Step #1 Send data to the DAC for the first quadrant with a PC autoincrementing,
Step #2 Send data for the second quadrant with the reversing PC direction instruction.
Step #3 Send data to the DAC for the third quadrant adding minus sign to all the data.
(Is there an instruction to self modify data sign in a defined memory range?)
Step #4 Send data to the DAC for the fourth quadrant reversing PC direction instruction (with data already reversed in sign).
Step #5 Reverse sign again to the data and loop to #1
PS: I don't like step #5. It looks that it brokes symmetry perfection. Could someone make an improved (symetrical, only four steps, NO loop, good looking ;-) algorithm?
The calling parameters (hub load address, and parameter list) should also be copied to PTRA & PTRB.
Effectively this would be done by a jump to hub rom which would execute the above loading.
I agree that fast load should be an option. In those cases, the cog code in hub RAM will have to be wide-aligned, starting at a 32-byte boundary. There is a memory price to pay for that speed. I think in most applications, it wouldn't be worth it, especially when your program is not that big.
COGINIT has been renamed to COGRUN/COGRUNX, while COGNEW/COGNEWX start an idle cog. The -X suffix means start in hub memory, without loading any code into the cog. The normal versions that load the cog can use a prefix long to state how many longs to load, where to start loading inside the cog, where to jump to when done, and whether or not to pre-clear registers.
These changes are compiling now. I need to adjust the assembler and reassemble the ROM code to try it out.
So, we're there! We can start a cog directly from hub memory in hub exec mode, without any delays.
Looks good.
Very? Extremely? Amazingly?
Some might have said it was impossible had it not been a Propeller.
Thanks for responding so quickly! You guys are a tremendous resource.
The DJNZ set makes a ton of sense, and we are all used to expressing things that way from P1. Incrementing isn't anything like the sweet and useful case decrementing is. The trade, whatever it is, is extremely highly likely to offer more benefit / opcode. IMHO, of course.
I've never used IJNZ or it's variants and can't think of a use for it in any of my stuff. I wouldn't miss it.
IJNZ would be the next best thing in this case, but perhaps there is another way to achieve the same?
DJNZ is widely used, on many micros, and naturally bounds small number scans at 0.
IJNZ bounds at the top (2^32-1), which is rather less directly useful, and if you want a incrementing loop, it can be done using
REPx and an INC ( I forget, did REPx get a counter per thread in the end ? )
Do you mean IJNE ( increment and Jump if Not Equal ) ? - that would give a means to bounded increment smaller numbers.
I think Tubular means...
In that case, maybe there is coverage IJ gives, that REPx does not
Can REPx early-exit via SW ?