Aha... I see. Very tricky as you do not need to waste instructions to set the pointers A & B as at normal cog load time they will be reset in hardware. Nice Thanks Roy.
That was on Stack type usage. Now only more inf on FIOFO in that modes I described some posts back..
If it is possible else if it be done with 32/64/128 longs pointers To be possible combine FIFO and Stack in same session of COG's RUN?
I got permission from Chip to post the planned changes to the CLUT memory access instructions.
First, there will be two pointers into the CLUT memory, A and B. This allows you to have two stacks. Perhaps one normal one and one expression solver one. You can even have the two stacks build in opposite directions (one at the end building down and the other at the beginning building up.
In the lists below, # is a constant and D is a register.
Here are the instructions to push/pop to/from the CLUT:
PUSHA D/# - write # via pointer A, and post increment A
PUSHB D/# - write # via pointer B, and post increment B
PUSHDNA D/# - pre decrement A, then write # via pointer A
PUSHDNB D/# - pre decrement B, then write # via pointer B
POPA D - pre decrement A, and read via pointer A
POPB D - pre decrement B, and read via pointer B
POPUPA D - read via pointer A, and post increment A
POPUPB D - read via pointer B, and post increment B
Here are the instructions for manipulating the pointers (for add and sub it wraps at 7 bits):
SETSPA D/# - write pointer A
SETSPB D/# - write pointer B
GETSPA D - read pointer A
GETSPB D - read pointer B
ADDSPA D/# - add to pointer A
ADDSPB D/# - add to pointer B
SUBSPA D/# - subtract from pointer A
SUBSPB D/# - subtract from pointer B
And finally here are the call/return instructions:
CALLA D/# - write address and C & Z flags to the CLUT at A then increment A, and jump to address
CALLB D/# - write address and C & Z flags to the CLUT at B then increment B, and jump to address
CALLAD D/# - same as CALLA, but executes the two instructions after this one
CALLBD D/# - same as CALLB, but executes the two instructions after this one
RETA - decrement A then read the value in the CLUT pointed to by A, and jump to that address.
if WC and/or WZ are specified then restore those flags before jumping.
RETB - decrement B then read the value in the CLUT pointed to by B, and jump to that address.
if WC and/or WZ are specified then restore those flags before jumping.
RETAD - same as RETA, but executes the two instructions after this one.
RETBD - same as RETB, but executes the two instructions after this one.
so with a 128entry clut, are we limited to hardware support for 128 colors out of a palette of over 24million?
Or maybe split each in two and have a acces to 256 colors out of 64K colors?
How is he adding all of thee instructions? I thaught he was bit limited in the instruction set.
I got permission from Chip to post the planned changes to the CLUT memory access instructions.
First, there will be two pointers into the CLUT memory, A and B. This allows you to have two stacks. Perhaps one normal one and one expression solver one. You can even have the two stacks build in opposite directions (one at the end building down and the other at the beginning building up.
In the lists below, # is a constant and D is a register.
Here are the instructions to push/pop to/from the CLUT:
PUSHA D/# - write # via pointer A, and post increment A
PUSHB D/# - write # via pointer B, and post increment B
PUSHDNA D/# - pre decrement A, then write # via pointer A
PUSHDNB D/# - pre decrement B, then write # via pointer B
POPA D - pre decrement A, and read via pointer A
POPB D - pre decrement B, and read via pointer B
POPUPA D - read via pointer A, and post increment A
POPUPB D - read via pointer B, and post increment B
Here are the instructions for manipulating the pointers (for add and sub it wraps at 7 bits):
SETSPA D/# - write pointer A
SETSPB D/# - write pointer B
GETSPA D - read pointer A
GETSPB D - read pointer B
ADDSPA D/# - add to pointer A
ADDSPB D/# - add to pointer B
SUBSPA D/# - subtract from pointer A
SUBSPB D/# - subtract from pointer B
And finally here are the call/return instructions:
CALLA D/# - write address and C & Z flags to the CLUT at A then increment A, and jump to address
CALLB D/# - write address and C & Z flags to the CLUT at B then increment B, and jump to address
CALLAD D/# - same as CALLA, but executes the two instructions after this one
CALLBD D/# - same as CALLB, but executes the two instructions after this one
RETA - decrement A then read the value in the CLUT pointed to by A, and jump to that address.
if WC and/or WZ are specified then restore those flags before jumping.
RETB - decrement B then read the value in the CLUT pointed to by B, and jump to that address.
if WC and/or WZ are specified then restore those flags before jumping.
RETAD - same as RETA, but executes the two instructions after this one.
RETBD - same as RETB, but executes the two instructions after this one.
After a bit of scribbling, the following should work for using the CLUT as a FIFO:
' initialize FIFO
' i am guessing A/B will be set to 0 on COGNEW?
SETSPA #0 ' initial write pointer
SETSPB #0 ' initial read pointer
' note if A==B fifo is empty
' add src to a FIFO
PUSHA src
' get dst from FIFO
POPUPB dst
I LOVE the dual stack capability! It allows for separate return and expression evaluation stacks!
We don't need code protection now, we never had it before. We have always felt that if we keep developing new
products there is no way anyone could keep up by trying to hack our stuff. We would always be 6 months ahead
of them.
hinv:
There are a couple of different ways to do video, one is to use the CLUT as a palette for doing color like you are thinking, where you'd have 128 colors at a time in 24bit color (also an 8bit alpha), or 256 colors at a time in 15 bit color (this is because the other bit is an alpha), The other involves pushing colors onto the CLUT and the video instructions pulling them off, so that you can get more colors via a form of streaming if you have enough external memory for a full bitmap.
jmg: I believe the counters are 2 per cog, and not per pin.
FYI, a trivial mod to the pushes and pops VHDL would allow easy detection of fifo full/empty conditions, and a tad more would allow for stack collision detection.
Ken would kill me if I distracted Chip with it, so my lips are sealed until after first working test run
(there, that should get me my first test chip faster)
After a bit of scribbling, the following should work for using the CLUT as a FIFO:
' initialize FIFO
' i am guessing A/B will be set to 0 on COGNEW?
SETSPA #0 ' initial write pointer
SETSPB #0 ' initial read pointer
' note if A==B fifo is empty
' add src to a FIFO
PUSHA src
' get dst from FIFO
POPUPB dst
I LOVE the dual stack capability! It allows for separate return and expression evaluation stacks!
Is this color lookup table like color registers on the old 8-bit computers (atari or similar)? I remember being able to do some nice dynamic color effects with that.
We don't need code protection now, we never had it before. We have always felt that if we keep developing new
products there is no way anyone could keep up by trying to hack our stuff. We would always be 6 months ahead of them.
BigFoot, it appears the the Prop 2 isn't actually providing code protection, except for a memory location that can be used to save an internal encryption key. Of course, this memory location could be used for some other purpose, such as holding a MAC ID or a manufacturing serial number. Any encryption that is done would be in software. Does anyone know for sure whether this is the case or not?
You are absolutely correct; if you use it as a FIFO, you really can't use it for anything else (there, I said it, now someone will prove me wrong by figuring out a clever way of having one stack and one fifo in the CLUT at the same time!)
If the CLUT is used as a stack, it can support two stacks in the 128 longs, and it is actually possible for a debugger to watch for stack collision, overflow and underflow with a bit of code.
For use as Stack and FIFO it needs at least 3 pointer registers One for Stack and 2 for FIFO - and FIFO regs need have variable Wraparound programing possibility's.
You are absolutely correct; if you use it as a FIFO, you really can't use it for anything else (there, I said it, now someone will prove me wrong by figuring out a clever way of having one stack and one fifo in the CLUT at the same time!)
If the CLUT is used as a stack, it can support two stacks in the 128 longs, and it is actually possible for a debugger to watch for stack collision, overflow and underflow with a bit of code.
For use as Stack and FIFO it needs at least 3 pointer registers One for Stack and 2 for FIFO - and FIFO regs need have variable Wraparound programing possibility's.
Dave,
My understanding is that the bootloader ROM would use the key to decrypt the external eeprom data as it reads it into the ram, and also use the key to encode the data as it writes it into the eeprom when you are programming it. So the eeprom image would be "protected".
That FIFO will stop You of use in same time this MEM as Stack
You can implement a FIFO with software pointers (even more than one FIFO). This only needs one of the CLUT pointers, so you can use the second for a stack:
SETSPA #64 'for Stack (64)
mov head,#0 'for FIFO (64)
mov tail,#0
PUSHA data 'stack access
POPA data
SETSPB head 'write to FIFO
PUSHB data
add head,#1
and head,#63
SETSPB tail 'read from FIFO
POPUPB data
add tail,#1
and tail,#63
Yes Andy. We can actually make the CLUT act as multiple fifos this way (one tx, one rx) but what we were really trying to achieve was to replace your 4 instructions with 1 instruction.
Sapieha correctly determined that 3 pointers would be a lot more useful. Indeed, with 3 pointers we could have a stack at each end (2 stacks) plus a variable using the middle section of the clut. This would really make it a lot more useful. A compare instruction between 2 pointers to set the C & Z flags would also be helpful, particularly if using them as a fifo. And a wraparound modifier/register would also be helpful for creating fifos.
Unfortuantely, we do not know how complex this is to create and how much silicon it uses.
So, ideally I would like 4 pointer registers, plus 4 mode registers where the mode registers set the pre/post increment/decrement (or none) for read (pop) and write (push) and wraparound size. Then there would be only 4 push (PUSHA..PUSHD) and 4 pop (POPA..POPD) instructions and 4 instructions to set the pointer (SETSPA...SETSPD) and 4 instructions to set the mode for each pointer (SETSPMA...SETSPMD).
In fact, aren't there other pointer registers that have pre/post increment/decrement options. Perhaps they could all share the same style associated mode registers???
Roy:
Thank you for this information on the CLUT/Stacks. This is making the Propeller II Assembly language come close to a more advanced 68K style assembly with no inturupts.
Dang, Now I have to rewrite 80% of my Propeller II code .
I am home now (yay, 9 hour drive). I extracted a ton of information from Chip about the new instructions and I have lots of notes. I will be making posts over the next few days with the information that Chip agreed I could talk about. Some will have more detail than others, but over time we will get more details. Some of the details are not available because I ran out of time before I had to come home, and some are left out because there might be tweaks to things due to critical paths.
Ok, so many interesting aspects of this; one that has me intregued... the fuse bits...
If I undeerstand this correctly, this would permit software level encryption or a hard wired serial number (if I understand things correctly). This would permit someone to use a read instruction to obtain it's value to be used else where, is this about correct? Some one already stated something about decryption routines and the boot loader already being in rom, but not to long after that someone said the rom was gone, so now I'm so confused I don't know if my ROM's been RAMed off the board or not. Could someone in the know please let me know ?
#1: Rom image contains boot loader and ???
#2: Fused bits are readable by PASM / SPIN instructions ?
#3: Support for decryption ?
#4: Has the ram sizes per cog and hub changed?
Ok, I don't want to be greedy, but I'm so waiting for this...
Oh, one last thinggie...
Because the pins / pin outs of the Prop II are set, more then likely in stone by now, Are the dev boards being developed for a near same time delpyments???
I for one will need at least 1 or 2, similar to the Propeller Proto Board (we'll need accessories too)...
I think something like this ASAP with the Prop II coming out would be helpful, and then, at a later time, something bigger / better to properly work some of it's super capabilities.
I'm also looking forward to any information you can provide about the instruction set. At this point I'm mostly interested in the instructions related directly to the processor core, and not so much the peripheral instructions that are used for ADC, DAC, counters and CLUT. I'm assuming the current jump instructions will incur a 3-cycle wait as the instruction pipeline refills. That is, the current jumps will require 4 cycles instead of 1, correct? And will the delayed jumps execute the following 3 instructions before jumping, or is it the following 4 instructions?
Can you also provide details on the multiplier? Is it 16x16, 32x32 or something else? Is it twos-complement or unsigned? What are the details of the macro-instructions that take multiple cycles to complete, such as the divide and cordic instructions? I suspect much of this was presented at UPEW, and there might be a PowerPoint document that answers these questions, but I haven't seen one posted. Are there any documents from UPEW with the answeres?
@KaosKidd,
I'm just reading the same blog that you're reading and listening to Chip on video, but most of your questions have already been answered:
#1: The ROM will now be a specially built part of the RAM that you can't change. It will contain the bootloader. That's all. Other stuff (like font tables or transcendental tables) will have to be read in from an EEPROM or SD card or be downloaded from a PC.
#2: Yes
#3: Yes. The part of the bootloader that writes the contents of RAM to EEPROM will be able to encrypt the data as it's being written using the fused bits as the key. Similarly, the part of the bootloader that reads external storage will be able to use the fused bits to decrypt the data as it's read.
#4: Cog RAM hasn't changed. There will be some additional RAM attached to each cog (CLUT) for use as a Color LookUp Table for video generation that will also be able to be accessed for other uses (mostly for stacks) when not used for video, but it's not an extension of the existing RAM. There's also some ROM attached to the cogs for CORDIC tables. Hub RAM will probably be on the order of 192K, but this will include the "ROM" for the bootloader.
No one has said anything about development boards, but, given previous comments, there will probably be several of them available at the same time as the chip is available for general use (as opposed to early engineering samples).
I'm very curious as to how this encryption is going to work...
I think it's going to require the key to be hidden from everything but the bootloader.
and, the bootloader itself would have to be encrypted...
Or maybe they'll just make it so that the bootloader code cannot be read, instead of encrypting it...
As long as the key is hidden from all but the bootloader, there is no reason to hide or encrypt the bootloader itself. In fact, it would be a mistake to depend on the secrecy of the bootloader for security, since security by obscurity always eventually fails.
Comments
Thanks to Chip and You.
That was on Stack type usage. Now only more inf on FIOFO in that modes I described some posts back..
If it is possible else if it be done with 32/64/128 longs pointers To be possible combine FIFO and Stack in same session of COG's RUN?
Anyone know how many counters per pin ? and what size and speed, and can you DivideN and capture, for proper frequency counters ?
Or maybe split each in two and have a acces to 256 colors out of 64K colors?
How is he adding all of thee instructions? I thaught he was bit limited in the instruction set.
Doug
Thanks for posting this... looks good!
' initialize FIFO ' i am guessing A/B will be set to 0 on COGNEW? SETSPA #0 ' initial write pointer SETSPB #0 ' initial read pointer ' note if A==B fifo is empty ' add src to a FIFO PUSHA src ' get dst from FIFO POPUPB dstI LOVE the dual stack capability! It allows for separate return and expression evaluation stacks!
We don't need code protection now, we never had it before. We have always felt that if we keep developing new
products there is no way anyone could keep up by trying to hack our stuff. We would always be 6 months ahead
of them.
There are a couple of different ways to do video, one is to use the CLUT as a palette for doing color like you are thinking, where you'd have 128 colors at a time in 24bit color (also an 8bit alpha), or 256 colors at a time in 15 bit color (this is because the other bit is an alpha), The other involves pushing colors onto the CLUT and the video instructions pulling them off, so that you can get more colors via a form of streaming if you have enough external memory for a full bitmap.
jmg: I believe the counters are 2 per cog, and not per pin.
Bill: Thanks for posting the code to do a FIFO!
FYI, a trivial mod to the pushes and pops VHDL would allow easy detection of fifo full/empty conditions, and a tad more would allow for stack collision detection.
Ken would kill me if I distracted Chip with it, so my lips are sealed until after first working test run
(there, that should get me my first test chip faster)
That FIFO will stop You of use in same time this MEM as Stack
It's something like that yes. You can dynamically manipulate the CLUT to create cool color animation effects.
You are absolutely correct; if you use it as a FIFO, you really can't use it for anything else (there, I said it, now someone will prove me wrong by figuring out a clever way of having one stack and one fifo in the CLUT at the same time!)
If the CLUT is used as a stack, it can support two stacks in the 128 longs, and it is actually possible for a debugger to watch for stack collision, overflow and underflow with a bit of code.
For use as Stack and FIFO it needs at least 3 pointer registers One for Stack and 2 for FIFO - and FIFO regs need have variable Wraparound programing possibility's.
I described that some posts above.
I know... but I fear to ask for any more features
Maybe I should start a Propeller 3 wish list thread...
For my uses, using the CLUT as a dual stack is more useful than a FIFO.
In an ideal world, 4 pointers and a larger CLUT would be better, but I'll wait for Prop3 for that. It will take a while to digest Prop2
My understanding is that the bootloader ROM would use the key to decrypt the external eeprom data as it reads it into the ram, and also use the key to encode the data as it writes it into the eeprom when you are programming it. So the eeprom image would be "protected".
You can implement a FIFO with software pointers (even more than one FIFO). This only needs one of the CLUT pointers, so you can use the second for a stack:
Andy
Sapieha correctly determined that 3 pointers would be a lot more useful. Indeed, with 3 pointers we could have a stack at each end (2 stacks) plus a variable using the middle section of the clut. This would really make it a lot more useful. A compare instruction between 2 pointers to set the C & Z flags would also be helpful, particularly if using them as a fifo. And a wraparound modifier/register would also be helpful for creating fifos.
Unfortuantely, we do not know how complex this is to create and how much silicon it uses.
So, ideally I would like 4 pointer registers, plus 4 mode registers where the mode registers set the pre/post increment/decrement (or none) for read (pop) and write (push) and wraparound size. Then there would be only 4 push (PUSHA..PUSHD) and 4 pop (POPA..POPD) instructions and 4 instructions to set the pointer (SETSPA...SETSPD) and 4 instructions to set the mode for each pointer (SETSPMA...SETSPMD).
In fact, aren't there other pointer registers that have pre/post increment/decrement options. Perhaps they could all share the same style associated mode registers???
I guess we are just being greedy
Thank you for this information on the CLUT/Stacks. This is making the Propeller II Assembly language come close to a more advanced 68K style assembly with no inturupts.
Dang, Now I have to rewrite 80% of my Propeller II code
In this case I think greed is good.....as long as it does not delay prop2.
If I undeerstand this correctly, this would permit software level encryption or a hard wired serial number (if I understand things correctly). This would permit someone to use a read instruction to obtain it's value to be used else where, is this about correct? Some one already stated something about decryption routines and the boot loader already being in rom, but not to long after that someone said the rom was gone, so now I'm so confused I don't know if my ROM's been RAMed off the board or not. Could someone in the know please let me know ?
#1: Rom image contains boot loader and ???
#2: Fused bits are readable by PASM / SPIN instructions ?
#3: Support for decryption ?
#4: Has the ram sizes per cog and hub changed?
Ok, I don't want to be greedy, but I'm so waiting for this...
Oh, one last thinggie...
Because the pins / pin outs of the Prop II are set, more then likely in stone by now, Are the dev boards being developed for a near same time delpyments???
I for one will need at least 1 or 2, similar to the Propeller Proto Board (we'll need accessories too)...
I think something like this ASAP with the Prop II coming out would be helpful, and then, at a later time, something bigger / better to properly work some of it's super capabilities.
I'm also looking forward to any information you can provide about the instruction set. At this point I'm mostly interested in the instructions related directly to the processor core, and not so much the peripheral instructions that are used for ADC, DAC, counters and CLUT. I'm assuming the current jump instructions will incur a 3-cycle wait as the instruction pipeline refills. That is, the current jumps will require 4 cycles instead of 1, correct? And will the delayed jumps execute the following 3 instructions before jumping, or is it the following 4 instructions?
Can you also provide details on the multiplier? Is it 16x16, 32x32 or something else? Is it twos-complement or unsigned? What are the details of the macro-instructions that take multiple cycles to complete, such as the divide and cordic instructions? I suspect much of this was presented at UPEW, and there might be a PowerPoint document that answers these questions, but I haven't seen one posted. Are there any documents from UPEW with the answeres?
Thanks,
Dave
I'm just reading the same blog that you're reading and listening to Chip on video, but most of your questions have already been answered:
#1: The ROM will now be a specially built part of the RAM that you can't change. It will contain the bootloader. That's all. Other stuff (like font tables or transcendental tables) will have to be read in from an EEPROM or SD card or be downloaded from a PC.
#2: Yes
#3: Yes. The part of the bootloader that writes the contents of RAM to EEPROM will be able to encrypt the data as it's being written using the fused bits as the key. Similarly, the part of the bootloader that reads external storage will be able to use the fused bits to decrypt the data as it's read.
#4: Cog RAM hasn't changed. There will be some additional RAM attached to each cog (CLUT) for use as a Color LookUp Table for video generation that will also be able to be accessed for other uses (mostly for stacks) when not used for video, but it's not an extension of the existing RAM. There's also some ROM attached to the cogs for CORDIC tables. Hub RAM will probably be on the order of 192K, but this will include the "ROM" for the bootloader.
No one has said anything about development boards, but, given previous comments, there will probably be several of them available at the same time as the chip is available for general use (as opposed to early engineering samples).
I think it's going to require the key to be hidden from everything but the bootloader.
and, the bootloader itself would have to be encrypted...
Or maybe they'll just make it so that the bootloader code cannot be read, instead of encrypting it...
-Phil
Believe it or not, I got lost in all the talk...
KK