COGINIT's Dest Field [PASM]

Vega256 · 2012-06-25 22:38

Hey guys,

A question about the PASM version of COGINIT.

The Prop Manual version 1.2 says that the destination register of COGINIT is a 32-bit field which is supposed to hold PAR, start up address, and other information about the start-up cog.
I thought that the destination and source fields of PASM only allowed for 9-bit values. What am I missing?

Mike Green · 2012-06-25 22:48

The destination and source fields are indeed 9-bit fields. The destination field always contains the address of its 32-bit operand. The source field either contains the address of its 32-bit operand or a 9-bit immediate value that's extended with zero bits to 32-bits. In the case of COGINIT, the destination field holds a variety of information packed into one 32-bit value.

Phil Pilgrim (PhiPi) · 2012-06-25 22:56

Vega256,

Just to clarify Mike's comment, the 9-bit destination field contains the cog address of a register that, in turn, contains the PAR, code address, and cog ID.

-Phil

Vega256 · 2012-06-25 23:06

Ah. So this way?

coginit   #:newCog



:newCog   $AABBCCDD

potatohead · 2012-06-25 23:11

(Actually NO, see comment below!)

Yes, or the other way, only able to reference the first 9 bits of HUB RAM, or a parameter that's 9 bits.

The key in this is to understand what the # sign does. (octothorpe)

9 bits are specified either way. When there is no # sign, those 9 bits point to the COG memory address that contains the 32 bit value. When there is a # those 9 bits are the value to be used directly.

kuroneko · 2012-06-25 23:14

potatohead wrote: »

Yes, or the other way, only able to reference the first 9 bits of HUB RAM, or a parameter that's 9 bits.

Actually no, the source field for coginit is always required to be %010 (which is implied when you use the official syntax).

coginit a            ' variant 1 (source field implied ==2)
        hubop   a, #%010     ' variant 2
        hubop   a, b         ' variant 3

a       long    $AABBCCDD
b       long    2

Vega256 · 2012-06-25 23:26

potatohead wrote: »

...When there is no # sign, those 9 bits point to the COG memory address that contains the 32 bit value. When there is a # those 9 bits are the value, embedded into the instruction itself.

Sorry for changing topic, but...

jmp   #somewhere

means jump to the address of somewhere.

jmp   somewhere

means jump to the address stored at somewhere.

How come # means the complete opposite in the case of jmp?

potatohead · 2012-06-25 23:45

Yes. Thank you for that. I was thinking the more general case. Appreciated kuroneko. I'll edit the above to cut down on confusion.

That is what I get for posting late

Re: JMP

Think of what gets loaded into the program counter. In both cases it's an address, right? The only discussion is where the address value comes from. Values put into the program counter are used as addresses. Values used in an add instruction, for example, are either just values or addresses to values.

Compare the two:

JMP #5
JMP 5

In both cases, the instruction contains the value 5. In the first case, the presence of the "#" means the value 5 will be directly loaded into the program counter. In the second case, the value 5 contains the COG long address that contains the value that will be loaded into the program counter. Say COG long 5 contains 10. These would then be the same:

JMP #10
JMP 5

Of course, we use labels for all of that, but I find it easier to just plug numbers in sometimes.

Now the ADD:

ADD 10, #5
ADD 10, 5

Say COG long 5 contains 10 just as it did last time. In both cases, the value 5 is present in the COG long that contains the ADD instruction. In the first case, the value 5 is to be added directly to the value contained in COG long 10.

Say COG long 10 contains 3. After the addition, COG long 10 would contain 8, in the first case.

In the second case, the value 5 is the address of the value to be added to the contents of COG long 10. After the addition, the COG long 10 would contain 13.

It really isn't any different. In both cases, a value is either used directly, or as an address to a value. The difference lies in the program counter treating values it sees as addresses, because that is what it does.

Mark_T · 2012-06-26 03:09

Vega256 wrote: »
Sorry for changing topic, but...
jmp   #somewhere
means jump to the address of somewhere.
jmp   somewhere
means jump to the address stored at somewhere.

How come # means the complete opposite in the case of jmp?

It doesn't at all. consider this:

:label   ...
         ....
         mov    pc, #:label    ' move the value :label into the program counter register - normally what you need
         mov    pc, :label     ' move the value in cog ram location :label into the program counter register.

         mov    r, #:label  ' move the value of :label into r
         mov    r, :label    ' move the instruction at :label (the contents of the cog register labelled ':label') into r

The 'jmp' opcode really means 'move into the pc' (jmp/call/ret are the only instruction that can access the pc directly). You also have to realize that each instruction lives in a cog register and that labels are just names for cog registers. The jmp instruction updates the pc, the instruction-execution unit does the read from cog ram to get the instruction given its cog-address in the pc.

Vega256 · 2012-06-26 06:57

PH, Mark_T

Yeah, that's right; I just got confused and second guessed myself. I've done asm for the Z80 and x86 architectures, so the program counter and how an instruction jump works isn't new to me. The notation is the complete opposite from my other assemblers. If I try to assemble something like this in Z80 asm,

JP 10

My assembler assumes that I mean literal 10, so if someLabel is at address 10, I can just put

JP someLabel

and it assumes the address at someLabel (which is 10). I don't need '#' anywhere; value means #value.

But back to the case of coginit. The destination field is really just the location where the 32-bit is, like Phil said. So then,

coginit  #:newCog

:newCog  long  $AABBCCDD

Should be

coginit  :newCog

:newCog  long  $AABBCCDD

Without the #?

tonyp12 · 2012-06-26 07:26

Correct.
With other mcu you are used to have R0 to R15 etc.
Think of the Prop of having R0 to R511.
You would never type in #R15 in other mcu either and only #nn when you want intermediate values

That the Prop acctually can have cog code in its R registers is something little harder to comprehend.

Vega256 · 2012-06-26 07:29

tonyp12 wrote: »

Correct.
With other mcu you are used to have R0 to R15 etc.
Think of the Prop of having R0 to R512.
You would never type in #R15 in other mcu either and only #nn when you want intermediate values

That the Prop acctually can have cog code in its R registers is something little harder to comprehend.

Thanks tonyp12.

Also, regarding bit 3 of the bitfield, what does it mean to restart a cog? Does it just send the specified cog to some other place in hub ram?

Heater. · 2012-06-26 07:44

The PASM coginit can act like a Spin cognew or coginit. Bit three determines which.
From the manual:

If the third field bit is set (1), the Hub will start the next available (lowest-numbered inactive)
cog and return that cog’s ID in Destination (if the WR effect is specified).

If the third field bit is clear (0), the Hub will start or restart the cog identified by Destination’s
fourth field, bits 2:0.

"restart" just means that you have given coginit the id of a cog and set that bit 3 to zero and that the specified cog is already running something. It just stops that something and loads up your new code.

Vega256 · 2012-06-26 17:36

Does anyone know how long it would take for a cog to fill its ram with code?

kuroneko · 2012-06-26 17:41

Vega256 wrote: »

Does anyone know how long it would take for a cog to fill its ram with code?

About 8K cycles (512 hub windows). The exact number depends on which cog is started and who is doing the coginit.

Vega256 · 2012-06-26 21:04

Finally getting to experiment with this.

How come the second bit-field for coginit is the upper 14 bits of a 16-bit address? I thought hub ram was only 32k large. What are the first two bits set to?

kuroneko · 2012-06-26 21:16

Hub memory covers 64K (RAM/ROM) which can be covered by 16bit addresses. Also, coginit deals with longs which means that only the upper 14bit of said address are relevant anyway. IOW it doesn't matter what those bits are set to given that they are ignored by rdlong.

Vega256 · 2012-06-26 21:45

kuroneko wrote: »

Hub memory covers 64K (RAM/ROM) which can be covered by 16bit addresses. Also, coginit deals with longs which means that only the upper 14bit of said address are relevant anyway. IOW it doesn't matter what those bits are set to given that they are ignored by rdlong.

But isn't it possible that the start up address is not a multiple of 4? Doesn't code exist in between longs?

kuroneko · 2012-06-26 21:51

Vega256 wrote: »

But isn't it possible that the start up address is not a multiple of 4? Doesn't code exist in between longs?

An insn is always 32bit in size. While it is true that you could place code unaligned in hub (e.g. at 4n+3) you wouldn't be able to load it (with coginit) until it is aligned.

That said, coginit only allows for 4n addresses (par and code base, 14+14+1+3), i.e. the lower two bits are always cut off.

Vega256 · 2012-06-30 11:48

Sorry for bumping, but now, I'm not so worried about how to get coginit working (still haven't), but how I can go about solving the much bigger problem that I thought coginit was the solution for.

I wrote a graphics driver in ASM which happened to be too big for a cog to handle. I thought that one way to solve this problem, aside from shrinking the code, was to split it into two pieces. I would then run the first part of the code with a cog, and when that same cog reaches the end of the first piece, restart it at the address of the second piece using coginit. This way, I have a continuous piece of code? Maybe I am over-complicating this. Do you guys see a different solution?

Mark_T · 2012-06-30 12:06

Vega256 wrote: »

Sorry for bumping, but now, I'm not so worried about how to get coginit working (still haven't), but how I can go about solving the much bigger problem that I thought coginit was the solution for.

I wrote a graphics driver in ASM which happened to be too big for a cog to handle. I thought that one way to solve this problem, aside from shrinking the code, was to split it into two pieces. I would then run the first part of the code with a cog, and when that same cog reaches the end of the first piece, restart it at the address of the second piece using coginit. This way, I have a continuous piece of code? Maybe I am over-complicating this. Do you guys see a different solution?

Code chaining like this should work but there will be a delay as the new code is loaded (I think this is 100us or so, certainly that's the fastest hub memory can be loaded up into the cog). You'll have to keep your live state in hub RAM of course.

potatohead · 2012-06-30 12:09

Starting the other COG will take ~8K cycles or so. Do you have time for that? Perhaps you were thinking of having the COG load while the busy one is still working?

IMHO, splitting it into two pieces is the right idea. Instead of serializing things, it is worth it to think about how to get the cogs to perform the task in parallel, so that the two COGS are just running, using some signal to communicate the processing tasks.

What are the driver tasks? We can discuss higher level structure in an attempt to find a good division of labor. Or, perhaps driver data is in the COG, and it can be placed in the HUB to free space.

Dave Hein · 2012-06-30 12:17

If you can spare an extra cog you can use one cog to run the first half and another cog to run the second half. You can use a location in hub RAM to signal between the cogs. This assumes that you just need more program memory, and you are not constrained by speed. If you also need to speed up execution you should use potatohead's suggestion and run the two cogs in parallel.

potatohead · 2012-06-30 12:26

I'm thinking overlap might make sense too, but I've not done it that way myself. Start the secondary COG early so it's running. Then have it latched to a signal from the first one. The very first thing it does is launch the first cog, which is latched in the same way.

I'm curious about the driver tasks myself. Good solutions will be more obvious.

Vega256 · 2012-06-30 12:35

potatohead wrote: »

Starting the other COG will take ~8K cycles or so. Do you have time for that?

I do not think so. The division happens between drawing tiles and drawing sprites. Would it make a difference if I was slow drawing sprites?

potatohead wrote: »

What are the driver tasks? We can discuss higher level structure in an attempt to find a good division of labor.

Algorithmically, what's going on is...

Driver reads tile data
COG puts tile data on a scanline buffer for that particular scanline
Driver reads sprite data
COG puts sprite data on the scanline buffer that particular scanline
TV Driver renders the line from the scanline buffer
Repeat for the next scanline

potatohead wrote: »

Or, perhaps driver data is in the COG, and it can be placed in the HUB to free space.

All of this is in COG ram, however, multiple COGs are doing this; they each do their own line. That's why I don't think putting the entire driver in the HUB would work because each COG needs its own copy of the driver. Would it work if I put all of it the HUB?

Vega256 · 2012-06-30 12:36

Dave Hein wrote: »

If you can spare an extra cog you can use one cog to run the first half and another cog to run the second half. You can use a location in hub RAM to signal between the cogs. This assumes that you just need more program memory, and you are not constrained by speed. If you also need to speed up execution you should use potatohead's suggestion and run the two cogs in parallel.

I thought of that, but speed is definitely a factor here.

Vega256 · 2012-06-30 12:41

Hey guys,

If you want to check the drivers out, they are here
http://forums.parallax.com/showthread.php?140874-Graphics-Driver-Improvement-and-Optimization.

Phil Pilgrim (PhiPi) · 2012-06-30 12:42

Why not have one cog for the tile data and another for the sprites? You would need a double scanline buffer, so that the tile cog stayed one line ahead of the sprite cog. That way the sprites would always get written after, and on top of, the tile pixels.

-Phil

potatohead · 2012-06-30 13:21

I would restructure this.

The tile sprite driver in my signature has some great rotating scan line buffer code that Bangers and I did.

It works like this:

One cog does the signal. It sets up the buffers pointers to things and the cog signal locations. It then can start graphics cogs.

There would be one buffer per graphics cog, plus one for the current scan line, two minimum.

Right away, the graphics cogs all render their scan lines and wait. When the signal cog begins to draw one to the screen, that buffer becomes the current buffer, signaling the graphics cog that built it to move to the next available buffer in the circular set of buffers, and it all continues to draw all the scan lines.

If you only use one graphics cog, it only has one scan line of time to work. That will get tiles and a few sprites. If you use two or four or more, they have x scan lines to work meaning more robust graphics on each scan line.

All graphics cogs are the same, meaning only one image in the hub, as well as overwriting it for buffers and such after the driver is running.

Vega256 · 2012-06-30 13:41

potatohead wrote: »

I would restructure this.

The tile sprite driver in my signature has some great rotating scan line buffer code that Bangers and I did.

It works like this:

One cog does the signal. It sets up the buffers pointers to things and the cog signal locations. It then can start graphics cogs.

There would be one buffer per graphics cog, plus one for the current scan line, two minimum.

Right away, the graphics cogs all render their scan lines and wait. When the signal cog begins to draw one to the screen, that buffer becomes the current buffer, signaling the graphics cog that built it to move to the next available buffer in the circular set of buffers, and it all continues to draw all the scan lines.

If you only use one graphics cog, it only has one scan line of time to work. That will get tiles and a few sprites. If you use two or four or more, they have x scan lines to work meaning more robust graphics on each scan line.

All graphics cogs are the same, meaning only one image in the hub, as well as overwriting it for buffers and such after the driver is running.

You think I could modify it for another hardware configuration? I don't have the demoboard.

potatohead · 2012-06-30 14:07

What is your hardware?

Edit: Well, that driver could be modded to your hardware. Have to see what that is.

Or, the technques can be applied to the one you've got cooking too.

I really only referenced it, because the buffers and graphics COG interaction in that one are well aligned with what you want to do.

COGINIT's Dest Field [PASM]

Comments