The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

potatohead · 2015-09-22 02:09

I think the lookup table will get a lot of use. Enough that also being able to run code out of it doesn't conflict with the name LUT

rabaggett · 2015-09-22 02:30

@ Heater
The right way to do things? Well, that's MY way, of course!
@RJO
Bag of turtles? COOL, where? (Turtle clan member, by the way..)
@ Evan H
Thanks, As I think about it, Even an inverter between pins would need a way to gate it in and out, Then some way to control it, Then another instruction... Yuck!
SN74AUC1G04 doesn't take that much room. But darnit anyway

msrobots · 2015-09-22 03:05

Heck Guys, HUB execution is just one mode of using the P2.

Loosing the first 4K of 512K for HUB exec is not really a problem.

All of them memory schemes seem way too complicated to do the effort. Confusing to newcomers, complicated for compiler builders and just not worth the pain of documenting and complicating the whole shebang.

As far as I understood It is accessible as HUB RAM for reading writing as usual, but not able to be executed in HUB exec mode. So what. We will still have 508K able to store HUB exec code. Just not the first 4K.

Like Roy said, keep it as simple as possible.

Do not make it more complicated as needed.

Enjoy!

Mike

koehler · 2015-09-22 03:25

I agree, shame to take something thats shaped up to address many concerns and complaints of the P1 and then add a boat anchor of ARM-like startup weirdness.

Complexity outweighs the benefits for the masses.

jmg · 2015-09-22 06:03

koehler wrote: »

I agree, shame to take something thats shaped up to address many concerns and complaints of the P1 and then add a boat anchor of ARM-like startup weirdness.

Complexity outweighs the benefits for the masses.

Which complexity are you talking about ?
The complexity of coding a 4k offset into all loader code, before it will run ?

To me, just selecting HUBSEG and starting code, is inherently simpler. Let the tools manage the details.

cgracey · 2015-09-22 06:41

There were some complexities to do with addressing that lead to the cog RAM being regarded as 2k of bytes rather than 512 longs.

I'm wondering if I stepped the PC by 1, instead of 4, during cog/LUT exec, if we can simplify the cog address range to 512, followed by 512 LUT addresses. This way, only 0..$3FF of hub range is taken. Also, we can get away from this issue of all cog addresses appearing as 4x their actual register addreses. That is a real nuisance to consider during programming, sometimes.

AntoineDoinel · 2015-09-22 11:11

msrobots wrote: »

Heck Guys, HUB execution is just one mode of using the P2.

All of them memory schemes seem way too complicated to do the effort. Confusing to newcomers, complicated for compiler builders and just not worth the pain of documenting and complicating the whole shebang.

Mike

Not necessarily more complicated... I realized that what I was suggesting is probably already implied in Chip's original plan, somewhat!

Given that

%000000000xxxxxxxxx00 = cog-exec
%000000001xxxxxxxxx00 = LUT-exec
everything else = hub-exec

then

%1000000000xxxxxxxx00 = hub-exec from lower 4KB

...since it falls in the "everything else" case.

With 1MB or more, you simply extend the PC to the left by one bit larger than the HUB size, exactly as it right now for 512KB.

Now, personally I would prefer leaving the two lower bits ignored for anything long aligned, code or data, like it was on P1:

%000000000xxxxxxxxxXX = cog-exec
%000000001xxxxxxxxxXX = LUT-exec
%everything else = hub-exec

But that's just my personal preference, it seems cleaner to me.

Dave Hein · 2015-09-22 11:18

Personally I like the simple method of using the long addresses 0-511 for cog RAM, 512-1023 for LUT and 1024 and greater for hubexec. I don't see any problems with excluding the first 1024 longs (4096 bytes) of hub RAM from hubexec.

I also don't see any major advantages to allowing non-aligned hub execution, and I think requiring long alignment is not a limitation. Eliminating byte addressing for hubexec would either allow for more memory in the future, or the 2 extra bits could be used for something else, such as more instructions.

Seairth · 2015-09-22 11:21

cgracey wrote: »

There were some complexities to do with addressing that lead to the cog RAM being regarded as 2k of bytes rather than 512 longs.

I'm wondering if I stepped the PC by 1, instead of 4, during cog/LUT exec, if we can simplify the cog address range to 512, followed by 512 LUT addresses. This way, only 0..$3FF of hub range is taken. Also, we can get away from this issue of all cog addresses appearing as 4x their actual register addreses. That is a real nuisance to consider during programming, sometimes.

I like this, as it means I don't have to think of the cog (or lut) RAM as byte memory. Addressing would be:

%0000_0000000x_xxxxxxxx : Cog registers, data is long-addressed from $000 to $1FFF, instructions are long-addressed from $0000 to $01FFF
%0000_0000001x_xxxxxxxx : LUT registers, data is long-addressed from $000 to $1FFF, instructions are long-addressed from $0200 to $03FFF
%0xxx_xxxxxxxx_xxxxxxxx : Hub RAM, data is byte-addressed from $0000 to $7FFFF, instructions are byte-addressed from $0400 to $7FFFF

In other words:
* Cog and LUT registers each have their own address range and instruction set for data addressing
* Hub memory has its own address range and instruction set for data addressing
* All memory shares a common address range and instruction set for instruction addressing
* The PC increments by 1. For hub execution mode, PC is multiplied by 4 to get the correct instruction.

Seairth · 2015-09-22 11:32

Dave Hein wrote: »

I also don't see any major advantages to allowing non-aligned hub execution, and I think requiring long alignment is not a limitation. Eliminating byte addressing for hubexec would either allow for more memory in the future, or the 2 extra bits could be used for something else, such as more instructions.

Agreed! I suspect that unaligned instructions will result in some difficult-to-find bugs. Initially, I would have suggested just making all instruction addresses long-addressed. However, that would mean that we'd be right back to not having the first $1000 bytes of hub for execution. On the other hand relative branching during hub execution will allow for 4x greater range (or was that already long-oriented?).

cgracey · 2015-09-22 18:22

I started modifying things so that hub addresses 0..$3FF would be cog/LUT execution and $400+ would be hub execution, but I ran into a gotcha on the 20-bit relative branches. I don't see a clean way around the problem of the PC incrementing by 1 in cog/LUT execution and incrementing by 4 in hub execution, when it comes to handling 20-bit relative branches. It means that cog/LUT code would not work in the hub and vice-versa - unless we constrained all hub code to long-aligned addresses. As it is now, the same code, using relative branches, can run in either the cog, LUT, or hub. To make cog execution addresses 0..$1FF, LUT $200..$3FF, and hub $400+ means that some funny stuff must go on in the hardware, as well as the assembler. It just looks like a lot of trouble.

So, I will go back to the original mapping:

%000000000CCCCCCCCCxx  = cog exec addresses ($000..$7FF, 512 longs, %xx treated as %00)
%000000001LLLLLLLLLxx  = LUT exec addresses ($800..$FFF, 512 longs, %xx treated as %00)
%00000001000000000000+ = hub exec addresses ($1000+)

This whole thing flared up on me when we went to hub-exec as default. Suddenly, boot code could not be located below $1000.

potatohead · 2015-09-22 18:38

I still don't see the big deal on allowing non aligned code in what could be the boot or system area... seems a nice fix.

All user programs just start at $1000 and that area is for data, or very specialized code...

AntoineDoinel · 2015-09-22 18:42

cgracey wrote: »
I started modifying things so that hub addresses 0..$3FF would be cog/LUT execution and $400+ would be hub execution, but I ran into a gotcha on the 20-bit relative branches. I don't see a clean way around the problem of the PC incrementing by 1 in cog/LUT execution and incrementing by 4 in hub execution, when it comes to handling 20-bit relative branches. It means that cog/LUT code would not work in the hub and vice-versa - unless we constrained all hub code to long-aligned addresses. As it is now, the same code, using relative branches, can run in either the cog, LUT, or hub. To make cog execution addresses 0..$1FF, LUT $200..$3FF, and hub $400+ means that some funny stuff must go on in the hardware, as well as the assembler. It just looks like a lot of trouble.

So, I will go back to the original mapping:
%000000000CCCCCCCCCxx  = cog exec addresses ($000..$7FF, 512 longs, %xx treated as %00)
%000000001LLLLLLLLLxx  = LUT exec addresses ($800..$FFF, 512 longs, %xx treated as %00)
%00000001000000000000+ = hub exec addresses ($1000+)
This whole thing flared up on me when we went to hub-exec as default. Suddenly, boot code could not be located below $1000.

Chip, forgive me if I keep beating a (most probably) dead horse, but what would happen in the scheme shown above if, after filling $00000-$03FFF, I simply jump at $80000?
Couldn't you make the topmost bit of PC a "don't care" only in the third case?

Rayman · 2015-09-22 18:51

Would that work?

If the normal case is going to be hubexec, why not use the top-most bit of the address to signify local cog memory?

It's been so long, I don't even remember how many bits are used for addresses now...
If it's full 32, seems shouldn't be a problem...

Sorta like positive addresses are HUB, negative are COG...

cgracey · 2015-09-22 19:03

AntoineDoinel wrote: »
cgracey wrote: »
I started modifying things so that hub addresses 0..$3FF would be cog/LUT execution and $400+ would be hub execution, but I ran into a gotcha on the 20-bit relative branches. I don't see a clean way around the problem of the PC incrementing by 1 in cog/LUT execution and incrementing by 4 in hub execution, when it comes to handling 20-bit relative branches. It means that cog/LUT code would not work in the hub and vice-versa - unless we constrained all hub code to long-aligned addresses. As it is now, the same code, using relative branches, can run in either the cog, LUT, or hub. To make cog execution addresses 0..$1FF, LUT $200..$3FF, and hub $400+ means that some funny stuff must go on in the hardware, as well as the assembler. It just looks like a lot of trouble.

So, I will go back to the original mapping:
%000000000CCCCCCCCCxx  = cog exec addresses ($000..$7FF, 512 longs, %xx treated as %00)
%000000001LLLLLLLLLxx  = LUT exec addresses ($800..$FFF, 512 longs, %xx treated as %00)
%00000001000000000000+ = hub exec addresses ($1000+)
This whole thing flared up on me when we went to hub-exec as default. Suddenly, boot code could not be located below $1000.
Chip, forgive me if I keep beating a (most probably) dead horse, but what would happen in the scheme shown above if, after filling $00000-$03FFF, I simply jump at $80000?
Couldn't you make the topmost bit of PC a "don't care" only in the third case?

We could absolutely do that. It would limit future memory size to 512KB, though, whereas we now have a 1MB address space (20 bits).

Maybe this would be best:

%1111_1111_0CCC_CCCC_CCxx  = cog exec addresses ($FF000..$FF7FF, 512 longs, %xx treated as %00)
%1111_1111_1LLL_LLLL_LLxx  = LUT exec addresses ($FF800..$FFFFF, 512 longs, %xx treated as %00)
below %1111_1111_0000_0000_0000 = hub exec addresses (below $FF000)

To jump to cog registers you'd JMP #$FF000. Maybe that's too ugly, too. Would it cause mental confusion between cog execution addresses and register addresses?

Electrodude · 2015-09-22 19:09

Please just make it as originally planned and make cog0 start at $1000 on boot. No program that has 508KB of hubexec code will have less than 4KB data. Not being able to hubexec the bottom 4KB won't be a problem after boot, and I would prefer more complicated boot logic in favor of simpler runtime behavior.

cgracey · 2015-09-22 19:10

potatohead wrote: »

I still don't see the big deal on allowing non aligned code in what could be the boot or system area... seems a nice fix.

All user programs just start at $1000 and that area is for data, or very specialized code...

I actually feel the same way. It seems like the best solution.

Top-justifying cog execution in the hub memory could cause upset on future devices, even expanding LUT memory. Keeping things bottom-justified leaves things more open-ended.

Seairth · 2015-09-22 19:15

cgracey wrote: »
So, I will go back to the original mapping:
%000000000CCCCCCCCCxx  = cog exec addresses ($000..$7FF, 512 longs, %xx treated as %00)
%000000001LLLLLLLLLxx  = LUT exec addresses ($800..$FFF, 512 longs, %xx treated as %00)
%00000001000000000000+ = hub exec addresses ($1000+)
This whole thing flared up on me when we went to hub-exec as default. Suddenly, boot code could not be located below $1000.

Since this is what everyone was expecting originally, how about go with this for the first FPGA image release. Then, while we are all testing it out, maybe we'll come up with a better idea.

Seairth · 2015-09-22 19:23

Chip,

Does WRLONG now work on unaligned addresses as well?

David Betz · 2015-09-22 19:26

cgracey wrote: »

potatohead wrote: »

I still don't see the big deal on allowing non aligned code in what could be the boot or system area... seems a nice fix.

All user programs just start at $1000 and that area is for data, or very specialized code...

I actually feel the same way. It seems like the best solution.

Top-justifying cog execution in the hub memory could cause upset on future devices, even expanding LUT memory. Keeping things bottom-justified leaves things more open-ended.

"There is a fifth dimension, beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition, and it lies between the pit of man's fears and the summit of his knowledge. This is the dimension of imagination. It is an area which we call the COG Shadow Zone."

cgracey · 2015-09-22 19:41

Seairth wrote: »

Chip,

Does WRLONG now work on unaligned addresses as well?

Yes. Alignment doesn't matter, anymore, for all those RDxxxx/WRxxxx instructions. Same for hub-exec.

Seairth · 2015-09-22 20:02

Okay, here's another thought: the only time that this addressing overlap is an issue is when you are switching from one exec mode to another. Suppose you used SETQ (well, give it a different mnemonic or set of mnemonics, but same opcode) just prior to any branch that is switching from one mode to another. In all other cases, instruction addressing is strictly limited to the current execution mode.

Your earlier example would look like:

	orgh

	<this is the entry point of your hub-exec program>

hcode	setq	#$1F7			'ready to load $1F8 registers (can be reduced, of course)
	rdlong	0,ptrb[ccode - hcode]	'load registers $000..$1F7 (doesn't need to start at 0)
	setq    #0 			' 0 = cog, 1 = lut, 2 = hub (default)
	jmp	#0			'jump to loaded code in cog RAM
ccode
	org

	<your cog-exec program goes here>

cgracey · 2015-09-22 20:14

David Betz wrote: »

cgracey wrote: »

potatohead wrote: »

I still don't see the big deal on allowing non aligned code in what could be the boot or system area... seems a nice fix.

All user programs just start at $1000 and that area is for data, or very specialized code...

I actually feel the same way. It seems like the best solution.

Top-justifying cog execution in the hub memory could cause upset on future devices, even expanding LUT memory. Keeping things bottom-justified leaves things more open-ended.

"There is a fifth dimension, beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition, and it lies between the pit of man's fears and the summit of his knowledge. This is the dimension of imagination. It is an area which we call the COG Shadow Zone."

Pit of man's fear. It leans in that direction. It is a raspberry-seed-in-your-wisdom-tooth kind of situation.

Okay...

This is ugly as all get-out, but here's what would work very nicely (consider that this allows for a 1K x 32 LUT):

$000000000xxxxxxxxx01 = cog execution addresses 0..511
$000000000xxxxxxxxx10 = LUT execution addresses 0..511
$000000000xxxxxxxxx11 = LUT execution addresses 512..1023
all others = hub execution addresses

This way, hub-exec would work from $00000 - perfect for ROM booting

Special-consideration memory only goes from $00000 to $007FF.

Nobody would notice these funny %01, %10 and %11 LSB's in cog/LUT addresses because they would be contained in symbols, with their LSB's established by the particular ORGCOG/ORGLUT/ORGLUT2 directive used before their declaration.

I think someone suggested something like this before.

Increasing the LUTs to 1K x 32 would only take about 1.2 mm2 of die area. If we couldn't fit it into this device, it could certainly go into a future smaller-geometry chip. We could implement it on the FPGA, in any case.

This would give 1,528 internal instructions per cog.

P.S. It was Seairth who had proposed something like this on the prior page.

cgracey · 2015-09-22 20:15

Seairth wrote: »
Okay, here's another thought: the only time that this addressing overlap is an issue is when you are switching from one exec mode to another. Suppose you used SETQ (well, give it a different mnemonic or set of mnemonics, but same opcode) just prior to any branch that is switching from one mode to another. In all other cases, instruction addressing is strictly limited to the current execution mode.

Your earlier example would look like:
	orgh

	<this is the entry point of your hub-exec program>

hcode	setq	#$1F7			'ready to load $1F8 registers (can be reduced, of course)
	rdlong	0,ptrb[ccode - hcode]	'load registers $000..$1F7 (doesn't need to start at 0)
	setq    #0 			' 0 = cog, 1 = lut, 2 = hub (default)
	jmp	#0			'jump to loaded code in cog RAM
ccode
	org

	<your cog-exec program goes here>

But what about returning from subroutines or going in and out of interrupt code. It has to be address-based, I think.

potatohead · 2015-09-22 20:29

All these other options bring the ugly bits out into ordinary, everyday code!

Treating that small region differently doesn't do that, and it can be completely ignored too. This makes perfect sense, because it's largely about init / boot type activities. Once those are done, the user can just use it as a data area...

The more schemes I see on this, the more I'm with Roy and a couple others. Just don't do it at all.

Seairth · 2015-09-22 20:38

cgracey wrote: »

But what about returning from subroutines or going in and out of interrupt code. It has to be address-based, I think.

Good point. Those would require capturing an additional 2-bit mode value. So, maybe not....

Seairth · 2015-09-22 20:44

Continuing to spitball... if this is mostly about boot code, how hard would it be to just treat the entire instruction address range as hub-only during boot? Then, at some point, you would switch to "normal" mode (whatever that is).

mindrobots · 2015-09-22 20:53

So the bottom bits are the memory mode select bits (they just happen to be at the LSB of the address.

$xxxxxxxxxxxxxxxxxx00 - hub execution with the entire memory map visible to you as a flat address space
$xxxxxxxxxxxxxxxxxx01 - COG execution - we all know it and love it and embrace the special register at the top of the 512 long window
$xxxxxxxxxxxxxxxxxx10 - LUT(1) execution - something new to play with
$xxxxxxxxxxxxxxxxxx11 - LUT(2) execution - maybe something new to play with

My code can jump from space to space and it can vector interrupts to any space. The only perhaps strange program flow thing is you can't have the PC increment from the last long of a LUT1 to the first long of LUT2 - they are distinct 512 long address spaces,so you'd need to jump? That's not really an inconsistency since you can't execute from the last COG long to the first LUT long since $1FF is a special register.

I think you had me with LUT2!!

Cluso99 · 2015-09-22 20:59

Chip

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

potatohead wrote: »

I still don't see the big deal on allowing non aligned code in what could be the boot or system area... seems a nice fix.

All user programs just start at $1000 and that area is for data, or very specialized code...

I actually feel the same way. It seems like the best solution.

Top-justifying cog execution in the hub memory could cause upset on future devices, even expanding LUT memory. Keeping things bottom-justified leaves things more open-ended.

"There is a fifth dimension, beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition, and it lies between the pit of man's fears and the summit of his knowledge. This is the dimension of imagination. It is an area which we call the COG Shadow Zone."

Pit of man's fear. It leans in that direction. It is a raspberry-seed-in-your-wisdom-tooth kind of situation.

Okay...

This is ugly as all get-out, but here's what would work very nicely (consider that this allows for a 1K x 32 LUT):

$000000000xxxxxxxxx01 = cog execution addresses 0..511
$000000000xxxxxxxxx10 = LUT execution addresses 0..511
$000000000xxxxxxxxx11 = LUT execution addresses 512..1023
all others = hub execution addresses

This way, hub-exec would work from $00000 - perfect for ROM booting

Special-consideration memory only goes from $00000 to $007FF.

Nobody would notice these funny %01, %10 and %11 LSB's in cog/LUT addresses because they would be contained in symbols, with their LSB's established by the particular ORGCOG/ORGLUT/ORGLUT2 directive used before their declaration.

I think someone suggested something like this before.

Increasing the LUTs to 1K x 32 would only take about 1.2 mm2 of die area. If we couldn't fit it into this device, it could certainly go into a future smaller-geometry chip. We could implement it on the FPGA, in any case.

This would give 1,528 internal instructions per cog.

P.S. It was Seairth who had proposed something like this on the prior page.

Chip,
Makes perfect sense to me and I'd love to have 1Kx32 LUT if there's die space

But, why do we have to have hub-exec able to run from non-long aligned code???
Seems to me that we have the cart before the horse and that is complicating the PC counter.

Why couldn't we just address all instructions on long boundaries and save the 2 bits (and it's complications for the masses to understand)?
The PC would contain an extra 2 (hidden) bits (that could be extended in future P2's) to designate COG/LUT/HUB.
The jump/call/return instructions would still contain these 2 bits, but the compiler would insert these depending on whether the address was in COG/LUT/HUB.

But simplifying even further, there should be no reason to differentiate the COG/LUT so we can have seemless instruction addresses from COG $000-$3FF(or 5FF), ignoring the special register gap. The compiler will just insert these 2 address bits.

So, in reality, the PC would be the same as you have now, just that it would increment by 4, and the last 2 bits would be defined as you have suggested here but would be hidden from the user (except in the case of actual hand assembly).

So, I am just saying, hide these 2 bits from the user. Hope I have made this clear enough

BTW We can live with the extra 2 bits being the address for simplifying your pnut compiler.

Cluso99 · 2015-09-22 21:15

Continuing on from my previous post...

Presuming where we have contiguous COG then LUT address space $000..$3FF/5FF

Why cannot the instructions RD/WR-LONG/WORD/BYTE & SETQ address the COG/LUT directly (ie the D Register contains along address $000..$3FF/$5FF COG+LUT address)? This would remove the requirement for SETL and RD/WR-LUT instructions.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

Comments