Propeller II update - BLOG

Cluso99 · 2013-12-20 14:10

potatohead wrote: »

Thanks!!

With all the changes, I'm gonna have to just go start coding a lot of broken PASM again! So nice to have a basic stack!

Unless you actually want to use the newer instructions and features, most of the work is behind the scenes (such as new opcode mapping), so you can just recompile

I have just been reworking my P2 Debugger - it uses LMM and hardly any P2 features so that I can backport to P1.
But I cannot resist using hubexec mode (so I will have to fork it)

Already using the UART - saves two pages of code and they are only a common TX Char and RX Char routine called by higher level pasm routines.

max72 · 2013-12-20 14:23

About DRC test:
I understand it is mandatory to do to final check on the validated machine, but wouldn't be possible to do the previous iterations on another, faster machine?
After having fixed everything on the faster, not validated one, only the final check must be issued to the other one..
Every change would be done on Beau's machine, and the new file exported and checked on the other one, license permitting.
It would speed up things with minimal risk.
If a DRC error proves to be wrong then revert to the validated one.

Monitor Lizard · 2013-12-20 19:31

cgracey wrote: »

I just need to find out why nothing's working. I hard-coded the TRACE into the Verilog so I'll be able to see on my logic analyzer what's happening internally. Hopefully, it's something identifiable. Whenever nothing works after a big change, I'm always preoccupied until I get it straightened out. I hate to have something so huge up in the air, especially at 4:40am.

Off topic question: What logic analyzer do you use?

hinv · 2013-12-20 19:32

Heater. wrote: »

rod1963,
(Please, nobody bring up that stupid Green Arrays thing).

Well, since you brought it up, what is stupid about it? OK, well beyond a couple of Chuckisms that don't seem to me to be good decisions like the meaning of each directional port depending on which core it is, and 18 bit instead of 32bit, what is stupid about it?

There are some really revolutionary things about it like:
* Implementing and extensible language (forth) in hardware for speed and efficiency
* Asynchronous operation
* Mailbox registers for high speed communication between processors
* Automatically sleep the processor until a port can be read or written to
* Having enough processors in the array that you can unwrap an interative loop into a physical loop of processors so that each processor handles a part of the loop

Now, I admit, I am not as good a programmer as you, nor do I have as much experience with low level programming on a variety of hardware as you, but maybe you missed some of the advantages of that other unmentionable.
Maybe I am missing some big stupidity...enlighten me.

Doug

Cluso99 · 2013-12-20 20:21

There is an error in the instruction mapping I did a couple of days ago.
Here what's wrong - different format - but Chip is about to release a new map, so no point doing the fix to my files.

 byte " PUSH    <empty> <empty> <empty>"  '--L- 1111111 00 L CCCC DDDDDDDDD 0110111ff  D/#                    
 byte " <empty> <empty> <empty> <empty>"  '--L- 1111111 00 L CCCC DDDDDDDDD 0111000ff  D/#                     
 byte " <empty> <empty> JMPREL  JMPRELD"  '--L- 1111111 00 L CCCC DDDDDDDDD 0111001ff  D                    
 byte " JMP     JMP_    JMPD    JMPD_  "  '--R- 1111111 00 0 CCCC DDDDDDDDD 0111010ff  D

cgracey · 2013-12-20 20:39

Monitor Lizard wrote: »

Off topic question: What logic analyzer do you use?

It's an old 100MHz Agilent MSO that has two analog channels and 16 digital channels. It's basically a scope with a logic tracer built in.

cgracey · 2013-12-20 20:48

hinv wrote: »

Well, since you brought it up, what is stupid about it? OK, well beyond a couple of Chuckisms that don't seem to me to be good decisions like the meaning of each directional port depending on which core it is, and 18 bit instead of 32bit, what is stupid about it?

There are some really revolutionary things about it like:
* Implementing and extensible language (forth) in hardware for speed and efficiency
* Asynchronous operation
* Mailbox registers for high speed communication between processors
* Automatically sleep the processor until a port can be read or written to
* Having enough processors in the array that you can unwrap an interative loop into a physical loop of processors so that each processor handles a part of the loop

Now, I admit, I am not as good a programmer as you, nor do I have as much experience with low level programming on a variety of hardware as you, but maybe you missed some of the advantages of that other unmentionable.
Maybe I am missing some big stupidity...enlighten me.

Doug

From what I've seen, Chuck Moore has some really neat ideas about how to do things very simply. He made a chip layout tool that used pre-built multi-layer tiles (from implant layers up to top metal) that could be arranged in all kinds of patterns without violating physical design rules. This made the layout editor dirt simple and lightning fast. There would be a density hit in taking such an approach, but if you optimized the blocks really well, it could signal very quickly. By taking that approach, he solved hundreds of problems at once. Today, every bit of juice gets squeezed out of the silicon area, but it takes very expensive and complex tools to do. You get to work at a very high level of abstraction, though. Chuck's ASIC design methodology keeps you close to the transistors. If you have a very simple design, that's just fine.

hinv · 2013-12-20 21:41

You are speaking above my head when it comes to chip layout, but maybe it is this layout tool that made him flip the different cores in mirror images. It's a neat design trick, but quite confusing because the Up Down Left and Right ports don't mean what they say in 3/4ths of the cores. That said, I would love to see a chip with the features his has but with full 32 bits, more memory, and better mapped ports which I think you are saying could be done with the big guns (layout tools) that you guys have.

I like having a chip I can get my head wrapped around at the assembly level which is why I like small instruction sets(the F18A is a bit too small IMHO). With the Propeller 1, I can understand even as a hobbiest. After trying to catch up to this thread I fear that the Propeller 2 is going to take me more time to figure out than I have time for. Of course, it won't be a moving target once we have silicon.
To me, it would have been a great incremental step to get the Prop1 fully filled out with mul and div, higher clock speeds, serdes, 64 IOs and more hub memory...like 62K RAM, 2K ROM, and still at low power.
But that is all water under the bridge now.
Going from the Prop1 to the Prop2 seems to me like going from the Vic20 straight to the Amiga, especially with all of the bit manipulation instructions being considered.

It'll be quite cool if it can be pulled off before things go bad.

Cluso99 · 2013-12-20 21:59

Chip,
Did you get a good night/morning sleep ?

rod1963 · 2013-12-20 22:39

Hniv

Moore's latest creation isn't that attractive to mainstream commercial consumers of microprocessors. For starters he forces you to use Forth which isn't very popular anymore nor a language that lends itself to parallel processing. Had Moore released a C variant along with some other tools that could exploit the chip without forcing people to become Forth gurus, I'd wager the GA144 might have a shot at design wins instead of languishing. It is too bad the people peddling the chip couldn't get over their attachment to Forth as the end all and be all of development tools.

The other thing that gets me is that even the GA people can't figure out what sort of wow demo application that can make their chip stand out from the PIC's, ARM's, MP430's,HC12's, etc. That's not good.

cgracey · 2013-12-20 23:40

Cluso99 wrote: »

Chip,
Did you get a good night/morning sleep ?

I did. I didn't get any Prop work done today, though. I took care of some things around the house that had been needing attention and then we went to a Christmas dinner at church. I'm going to just sleep tonight because I'm feeling kind of worn out. I should be back on it tomorrow evening, if not earlier.

Cluso99 · 2013-12-21 00:22

cgracey wrote: »

I did. I didn't get any Prop work done today, though. I took care of some things around the house that had been needing attention and then we went to a Christmas dinner at church. I'm going to just sleep tonight because I'm feeling kind of worn out. I should be back on it tomorrow evening, if not earlier.

Well, you certainly deserve a good rest and it does wonders for the brain too!

I have been working on a lookup table for the opcodes 1000100-1111100 (ENCOD - JNZD) to for disassembly. I have the others correctly decoding the instruction opcodes, but yet to do the operands for them. But this group is a bit more complex. I think I have it worked now - it's a lookup table looking up a table.

potatohead · 2013-12-21 00:56

Unless you actually want to use the newer instructions and features, most of the work is behind the scenes (such as new opcode mapping), so you can just recompile

Yep. But, I've got to just did in again this weekend. End of year was kind of brutal for me. Fun times next week!

Heater. · 2013-12-21 03:39

hinv,

I said "Please, nobody bring up that stupid Green Arrays thing" and what do you do?

You caught me out. I was making a very brazen statement out of almost total ignorance of what I'm talking about.

My gut feeling is that, for the reasons you listed and others mentioned here, nobody is ever going to figure out how to do anything useful with the GA devices.

I'm going to have to go back and read over the GA pages again to see if I have missed the point somewhere.

David Betz · 2013-12-21 04:38

Heater. wrote: »

hinv,

I said "Please, nobody bring up that stupid Green Arrays thing" and what do you do?

You caught me out. I was making a very brazen statement out of almost total ignorance of what I'm talking about.

My gut feeling is that, for the reasons you listed and others mentioned here, nobody is ever going to figure out how to do anything useful with the GA devices.

I'm going to have to go back and read over the GA pages again to see if I have missed the point somewhere.

I found the GA device interesting but the last I knew they didn't offer an affordable development board so I never got a chance to play with one.

hinv · 2013-12-21 06:20

Yeah, the price tag has stopped me too. They are shooting themselves in the foot with that. They also don't have nearly as good of support either. The parallax forums alone provide enough to get a poor noobie to get started with the prop for about $20.
The other thing that has stopped me is the interfacing at 1.8V. I figured that when I get a prop2 I could put the two together using the GA144as a coprocessor of sorts.

Cluso99 · 2013-12-21 12:46

Chip has generally asked about extra instructions because there is silicon space. Here is one group that I regularly use..

SETZC D/# [WZ],[WC]

It currently exists and sets the Z & C flags via WZ & WC according to D[1:0]

What would be nice is to extend this instruction to...

SETZC D/#,#0..31 [WZ],[WC]

where it first rotates right (NR) #0:31, then sets the Z & C flags via WZ & WC according to the resulting D[1:0]; The result D is not written.
This also allows for the case where the bits are in b0 & b31 which would use ror of #31
Obviously a rotate of #0 is the default and acts like the original instruction.

This permits the instant decoding of a pair of bits anywhere in D. I often use SHR with NR to decode 2 bits but we no longer have the NR option.

Bill Henning · 2013-12-21 13:29

That looks really handy!

Cluso99 wrote: »

Chip has generally asked about extra instructions because there is silicon space. Here is one group that I regularly use..

SETZC D/# [WZ],[WC]

It currently exists and sets the Z & C flags via WZ & WC according to D[1:0]

What would be nice is to extend this instruction to...

SETZC D/#,#0..31 [WZ],[WC]

where it first rotates right (NR) #0:31, then sets the Z & C flags via WZ & WC according to the resulting D[1:0]; The result D is not written.
This also allows for the case where the bits are in b0 & b31 which would use ror of #31
Obviously a rotate of #0 is the default and acts like the original instruction.

This permits the instant decoding of a pair of bits anywhere in D. I often use SHR with NR to decode 2 bits but we no longer have the NR option.

Baggers · 2013-12-21 14:00

Chip, upon seeing Cluso's request for rotation sparked an idea.

Don't worry, I know it's got two writes, so probably couldn't be done.

Basically it was an instruction like MARB2/4/8 D,S which would depending on 2/4/8 MOVE Source into Destination ANDed with (3 when set 2, 15 when 4, 255 when 8) then ROTATE the source 2 4 or 8 bits to the right.

Maybe just the MAND2/4/8 D,S MOVE S into D ANDed with 3/15/255 could even have 3bits to make it and with 3/7/15/255 can use 2 bits from the flags part of the instruction to determine the 2/3/4/8 bit count, that way it's just one instruction, the rotate can be done in a second instruction.

Edit: no actually, save it for P3, you've already done a lot of graphical enhancements for the P2 already!

Cluso99 · 2013-12-21 17:59

May I also suggest that the instruction

SETX D,S/#

where D[22:18] of an instruction (I and CCCC bits) are modified
to...

SETCCCC D,S/#

where D[21:18] of an instruction (only the CCCC bits) are modified

It seems to me that it would be way more useful to be able to change just the conditional bits (cccc) on an instruction (ie preserving the "I" bit rather than changing it.
The I (or L) bit can be changed using the bit instructions if necessary.

ozpropdev · 2013-12-21 18:14

Cluso99 wrote: »

May I also suggest that the instruction

SETX D,S/#

where D[22:18] of an instruction (I and CCCC bits) are modified
to...

SETCCCC D,S/#

where D[21:18] of an instruction (only the CCCC bits) are modified

It seems to me that it would be way more useful to be able to change just the conditional bits (cccc) on an instruction (ie preserving the "I" bit rather than changing it.
The I (or L) bit can be changed using the bit instructions if necessary.

That makes sense to me too.
The I bit in an instruction can simply be modified with SETB,CLRB etc. if required.
May I suggest SETCOND as an alternative name.

Cluso99 · 2013-12-21 18:19

Baggers wrote: »

Chip, upon seeing Cluso's request for rotation sparked an idea.

Don't worry, I know it's got two writes, so probably couldn't be done.

Basically it was an instruction like MARB2/4/8 D,S which would depending on 2/4/8 MOVE Source into Destination ANDed with (3 when set 2, 15 when 4, 255 when 8) then ROTATE the source 2 4 or 8 bits to the right.

Maybe just the MAND2/4/8 D,S MOVE S into D ANDed with 3/15/255 could even have 3bits to make it and with 3/7/15/255 can use 2 bits from the flags part of the instruction to determine the 2/3/4/8 bit count, that way it's just one instruction, the rotate can be done in a second instruction.

Edit: no actually, save it for P3, you've already done a lot of graphical enhancements for the P2 already!

This area of the instruction set is pretty tight already. However, I would expect that the immediate mode of S/# might be able to be removed???, and if so, it might be able to marry with an instruction that only uses the immediate #S version.

Is this basically what you are after ???

AND D,#mask
[I] where "2" uses mask = %11, "4" uses mask = %1111, "8" uses mask = %11111111
[/I]OR D,S/#
 [I]where "2"  ORs S[1:0], "4" ORs S[3:0], "8" ORs S[7:0]
[/I]ROR D,#bits
 [I]where "2" uses bits=2, "4" uses bits=4, "8" uses bits=8
[/I]

This could be encoded in a single instruction using ZC to encode 2/4/8 (or 2/3/4/8) although again, it could share an instruction using the fourth unused zc bits if only 2/4/8 were used.

How would it be used?
How common/often is this instruction used?

Cluso99 · 2013-12-21 18:27

ozpropdev wrote: »

That makes sense to me too.
The I bit in an instruction can simply be modified with SETB,CLRB etc. if required.
May I suggest SETCOND as an alternative name.

Sure. I thought SETC would be confusing.

However, I think prefer the older MOVS, MOVD, MOVI & MOVC/MOVCCCC/MOVCOND.
I do understand Chip's thinking they are all part of SET, but I see them differently.
Anyone have a preference? (They are just semantics for the compiler)

ozpropdev · 2013-12-21 18:59

Cluso99 wrote: »

Sure. I thought SETC would be confusing.

However, I think prefer the older MOVS, MOVD, MOVI & MOVC/MOVCCCC/MOVCOND.
I do understand Chip's thinking they are all part of SET, but I see them differently.
Anyone have a preference? (They are just semantics for the compiler)

SETC would be confusing, MOVC seems to fit well in the old MOVS/D/I model.
It didn't bother me one way or the other, but now thinking about it I think I prefer the old MOV style.
Just my $0.02 worth of input.

ozpropdev · 2013-12-21 20:59

Cluso99 wrote: »
This area of the instruction set is pretty tight already. However, I would expect that the immediate mode of S/# might be able to be removed???, and if so, it might be able to marry with an instruction that only uses the immediate #S version.

Is this basically what you are after ???
AND D,#mask
[I] where "2" uses mask = %11, "4" uses mask = %1111, "8" uses mask = %11111111
[/I]OR D,S/#
 [I]where "2"  ORs S[1:0], "4" ORs S[3:0], "8" ORs S[7:0]
[/I]ROR D,#bits
 [I]where "2" uses bits=2, "4" uses bits=4, "8" uses bits=8
[/I]
This could be encoded in a single instruction using ZC to encode 2/4/8 (or 2/3/4/8) although again, it could share an instruction using the fourth unused zc bits if only 2/4/8 were used.

How would it be used?
How common/often is this instruction used?

I think that's what Baggers is talking about except the SOURCE is rotated not the DEST.

Edit: I assume it's for separating pixel data?

potatohead · 2013-12-21 21:24

How common/often is this instruction used?

Lots if you are taking pixel streams to be combined, packed into LONGS.

Cluso99 · 2013-12-21 22:09

ozpropdev wrote: »

I think that's what Baggers is talking about except the SOURCE is rotated not the DEST.

Edit: I assume it's for separating pixel data?

Ugh! I missed that part - just assumed the destination was rotated.

If he is wanting the source to be rotated, that is not possible as that would mean 2 writebacks to the cog in stage 4.

So this is what he was after...
Which means combining these instructions...

AND D,#mask
[I] where "2" uses mask=11, "4" uses mask=1111, "8" uses mask=11111111
[/I]MOV tmp,S
AND tmp,#mask
OR D,tmp

followed by

ROR S,#n
 [I]where n=2/4/8[/I]

Cluso99 · 2013-12-21 22:12

potatohead wrote: »

How common/often is this instruction used?

Lots if you are taking pixel streams to be combined, packed into LONGS.

I guess that depends if you are replacing sections in an already formatted stream?
If you are building from scratch, maybe the existing instructions would be capable?

Baggers · 2013-12-22 02:42

Yeah, it was for the source to be rotated post move/and

Cluso, it would be used a lot with games, but can be not just for pixel manipulation, but bit extraction for many other purposes also.
another game related one, would be extracting character tiles from a packed map, we have a very fast way to get a lot of data from hub into 8 longs, but then extracting those bits would slow it down again, if it was bytes you were using, it could possibly be faster ( I haven't checked ) to just read bytes cached, than it would to get the longs, then copy to a temp, rotate the bits to somewhere usable and AND it.
As I said it wasn't just for game purposes, it could be for reading in pre-packed streams of 2 or more bits that were read from a serial pin with the fast comms we'll have that puts it into longs automatically.

Cluso, the instruction would of equated to the following

  MOV DST,SRC
  AND DST,#3/7/15/255
  ROR SRC,#2/3/4/8

I know it'd be two writes, so I edited the post suggesting just the MOVE+AND in one instruction.

It was an instruction that I thought could of been very useful for optimising code, and I assume cheap in logic, as you're just getting certain bits of the src to put in the dst. ( assuming the two instructions only here not the extra rotation/write back to src )

Cluso99 · 2013-12-22 04:49

There are some instructions that work on nibbles, bytes and words, such as SETNIB, GETNIB, etc. Perhaps these cando what you require?

Basically, what you are doing here is extracting a pair/nibble/byte with zero left extend.
Do you then use that for lookup?

Propeller II update - BLOG

Comments