Propeller II update - BLOG

Heater. · 2011-05-23 07:51

And how big is this CLUT?

Sapieha · 2011-05-23 07:54

Hi Heater.

128x32bits

Heater. wrote: »

And how big is this CLUT?

Heater. · 2011-05-23 08:18

Hmm...I forsee CLUTMM?

jazzed · 2011-05-23 08:28

RossH wrote: »

Anecdotally, a GCC port is said to take 3 - 6 months when done by experienced GCC developers who have working silicon and a fairly "orthodox" architecture to target. A GCC port done by non-GCC developers, to a chip that is not fully defined even in some of its fundamental features - and for a very unorthodox architecture - might be expected to take somewhat longer. A GCC port that has to be re-written to take advantage of new instructions on said unorthodox architecture - or to remove the use of instructions that didn't quite pan out as expected - could be expected to take even longer!

Ross, on May 17th you met the team on-line which has considerable GCC porting experience. I have confidence that their GCC expertise and Bill's enthusiasm and early experiments with his Propeller 2 VM ideas will produce a solution in 6 months or less. This is a direction that Parallax seems intent to take. All it takes is execution.

Your efforts with Catalina will be rewarded and I will be using your remarkable one-owner effort product in various scenarios over the summer that will be good for Catalina and GCC.

BigFoot · 2011-05-23 09:37

We couldn't make the Expo this year but it sure seems like allot of progress has been made on the Propeller II.
Do any of you know the link to the video on Chips talk ?

It looks like the end of this year is going to be a very exciting time for Parallax and all of us.

JRetSapDoog · 2011-05-23 10:31

Courtesy of John R's thread,

Chip G's Q&A: http://www.ustream.tv/recorded/14872589 (if this doesn't work, see method below)
Presentations: http://www.ustream.tv/recorded/14867090

John's Thread: http://forums.parallax.com/showthread.php?131805-UPEW-Online-Live!
(If using this link, click the video on the lower-left entitled "UPEW 1" for Chip's talk fielding Q's)

BTW: potatohead and Cluso99 have good summaries of Chip's talk on pg. 18 of this current thread:
Page 18: http://forums.parallax.com/showthread.php?125543-Propeller-II-update-BLOG/page18
(The audio is a bit muffled, so you might want to refer to this as you listen/watch)

If there are any other videos of Chip and Co. (UPEW-11) available for viewing, I haven't come across them.

potatohead · 2011-05-23 10:36

If you pick up on the I/O discussion, please post your thoughts. Most of us haven't really put that together yet.

jazzed · 2011-05-23 11:06

potatohead wrote: »

If you pick up on the I/O discussion, please post your thoughts. Most of us haven't really put that together yet.

The DAC and ADC ability were main IO topics. ADC resolution would be something like 18bits which is pretty impressive. Also ADCs between pins could be used in differential mode. I don't remember the DAC details.

One item that came out was that 3.3V IOVDD would work faster than 1.8V IOVDD (1ns vs 40ns or something). I really don't know why that would be. It seemed like the "current sink" target was like 40mA on a pin. Not many of us had questions on the IO.

IIRC there are programmable pull-ups, etc... on each IO, so I didn't ask about it again

Phil Pilgrim (PhiPi) · 2011-05-23 12:04

jazzed wrote:

One item that came out was that 3.3V IOVDD would work faster than 1.8V IOVDD (1ns vs 40ns or something). I really don't know why that would be.

The more voltage you apply to a CMOS gate, the faster it will charge and, therefore, switch.

-Phil

jazzed · 2011-05-23 13:51

Phil Pilgrim (PhiPi) wrote: »

The more voltage you apply to a CMOS gate, the faster it will charge and, therefore, switch.

Well that explains why 74HCxxx perform so crappy. More recent technologies don't seem to have that problem though.

Dave Hein · 2011-05-23 14:16

I recall in the 70's the original 4000 series CMOS gates had a switching time on the order of a micro-second. They were nice because of the wide power supply voltage range and low power. However, they were useless if you wanted to do anything at high speeds. ECL was the high-speed king at that time. I don't know why they were so slow -- they must have had a high on resistance and high capacitance. Sorry for going OT.

SSteve · 2011-05-23 15:28

One thing I haven't seen mentioned yet from Chip's presentation: One of the changes (I think having to do with the RAM/ROM interface) opened up some space at the bottom of the section with the fuse bits. Someone asked if they couldn't scoot everything down and add more fuse bits. Chip and Beau seemed to think that was possible. I guess that could allow for better encryption? I'm not sure, but I thought I'd post it here just for completeness' sake.

Rayman · 2011-05-23 15:34

Yes, and Kye raised the point that maybe 64-bits is not enough... I know 128-bit encryption is in use a lot, but not sure if there's a 1-1 correlation between the number of key bits and encryption bits...

Also, I wonder if Parallax has looked to see if they might have export control issues with 128-bit encryption...

Dave Hein · 2011-05-23 15:40

128-bit AES encryption is used in video conferencing. I'm not aware of any export issues with AES, but I do recall that there were export restrictions on DES.

Phil Pilgrim (PhiPi) · 2011-05-23 16:03

I don't know why there would be any export restrictions on mere fuse bits, since the encryption algo would be up to the programmer to provide. As I understand it, the only thing provided in hardware is a way to burn a secret key -- or whatever -- into the fuses. That, in itself, is not encryption technology.

-Phil

Cluso99 · 2011-05-23 16:15

Sapieha: What is the use of a write and decrement? If that is required, then wouldn't a read and decrement be needed also? Sort of a reverse fifo??

All: Chips is focusing on being able to better support higher level languages by using fewer instructions to support some of the more common requirements. This sounds great to me - I am sure I'll find ways to use them in pasm too! So we will have much more capabability in 2KB (512Kx32) cog ram, and 512KB (128x32) of CLUT/FIFO.

RossH · 2011-05-23 16:46

jazzed wrote: »

Ross, on May 17th you met the team on-line which has considerable GCC porting experience

Hi jazzed,

I wasn't questioning the experience of the team or their committment, as much as the commercial risk Parallax might be taking (perhaps unwittinngly) in releasing a chip aimed at a professional market before a suitable professional toolset is available. If the Propeller II is branded as a "hobbyist" chip on release it may never recover, however brilliant we here in the forums may think it is. This was one of the problems with the Prop I. Those scathing reviews about the Prop I and its obscure programming language are still out there (nothing is ever forgotton on the internet) - and are often the first (and perhaps only) reviews people find when they first hear about it and go looking for more information.

Ross.

P.S. Thanks for the words of support regarding Catalina. I will be following the progress of the GCC work with interest, even if I end up not actively participating. My own opinion is that which language and compiler Parallax chooses is much less important than their actively supporting a mainstream language (in addition to Spin, of course).

Invent-O-Doc · 2011-05-23 16:48

Making too many changes at the last minute is a good way to introduce errors (a recipe for disaster) Be careful out there! (Although some of these changes sound pretty cool.)

Ariba · 2011-05-23 16:48

Cluso99 wrote: »

Sapieha: What is the use of a write and decrement? If that is required, then wouldn't a read and decrement be needed also? Sort of a reverse fifo??

IMHO Chip needs only to change the RDCLUT instruction to pre-decrement instead of post-increment.

So the stack grows upwards. For a PUSH you write the data and increment the address pointer, for a POP you decrement the address and read the data.
If you fill in a Color lookup table you only need WRCLUT, and it's the natural way to fill it from bottom to top.
If you want read a single color you anyway set the address new. If you need to read the whole CLUT then you have to do it downward.
If the CLUT RAM is used for additional variables you need the set the address+1 with SETCLUT before you read the variable - not so nice.

Andy

Cluso99 · 2011-05-23 18:15

Ariba: Of course you are correct (rd pre dec & wr post inc)- silly me not thinking before I wrote

I have not used a CLUT. Of course we could work around the filling from bottom to top and also setting +1 before reading for variables. However, for variable use, better to have an option to not inc/dec.

So, ideally, if it is easy...
* RDCLUT & WRCLUT
* Each with optional Pre-Increment, Post-Increment, Pre-Decrement, Post-Decrement, No-Increment/Decrement

However, I will live with whatever we get as long as we can access it. This is because we get another 25% cog space which can only be used as variable or stack space provided no clut is required - this is a big increase!! We can also use this for a short LMM type store or overlay space. One thing for sure is we will find lots of ways to exploit it

As for introducing errors, I think (and trust that) Chip & Beau have this well in hand. Chip has proven he is not a risk taker when it comes to silicon. How many other chip manufacturers have no errata in their silicon?

Sapieha · 2011-05-23 19:20

Hi Cluso

Them have already FIFO type RD/WR-CLUT instructions (Both Increments Pointer) --- Left side of my picture.
BUT for STACK type of function Needs type of function I have on Right side of Picture (One Increment other Decrement).

Sorry but I cant explain it by word's - Maybe You understand it from my Picture in PDF-Picture file

Cluso99 wrote: »

Sapieha: What is the use of a write and decrement? If that is required, then wouldn't a read and decrement be needed also? Sort of a reverse fifo??

All: Chips is focusing on being able to better support higher level languages by using fewer instructions to support some of the more common requirements. This sounds great to me - I am sure I'll find ways to use them in pasm too! So we will have much more capabability in 2KB (512Kx32) cog ram, and 512KB (128x32) of CLUT/FIFO.

Leon · 2011-05-23 19:37

Cluso99 wrote: »

As for introducing errors, I think (and trust that) Chip & Beau have this well in hand. Chip has proven he is not a risk taker when it comes to silicon. How many other chip manufacturers have no errata in their silicon?

XMOS? The four-core XS1-G4 has no errata and the XS1-L1 and XS1-L2 single-core and two-core devices only have a restriction of a driving impedance of <100R to ground on the JTAG signals to guarantee a logic low, which isn't exactly a design fault,

You asked the question, so you can't complain if you don't like the answer!

Roy Eltham · 2011-05-23 19:37

Chip's office right now:

Chip, Beau & Roy

Kye (Kwabena) & Chip

Also, I have a post in the works with all the details of the planned changes for the CLUT access instructions. Chip gave me the notes and permission to post, just working out the details. I'll post later tonight.

Sapieha · 2011-05-23 20:51

Hi Cluso.

NEXT Ideally solution for this MEM need.

2 set of RD/WR instructions and 3 Pointer regs.

One set for FIFO (Circular) = 2 Pointer's and RD/WR with Increment pointer registers.
Th that Pointers need have wrap around on INC possibility's with set its MODE to Wrap on 32/64/128 positions with start from - 000 position.

And other set
Second set with WR- decrement and RD Increment and one Pointer register that default sets to 1FF but can be changed.

That can give possibility to even delete this MEM in 2 parts if needed ONE for FIFO(Circular) and other part as STACK space.
And as Propeller not have Interrupts in most case Stack space NOT need be so BIG.

MOST Ideally.

To that I described in REED.

+ 1 extra (Totally 3 set's) set of RD/WR and one more Address pointer for NOT Incremental Read/Write in any desired position.

If we not mention CLUT as is default usage.
ONLY that can give this MEM full usability without waste of silicon

Cluso99 wrote: »

I have not used a CLUT. Of course we could work around the filling from bottom to top and also setting +1 before reading for variables. However, for variable use, better to have an option to not inc/dec.

So, ideally, if it is easy...
* RDCLUT & WRCLUT
* Each with optional Pre-Increment, Post-Increment, Pre-Decrement, Post-Decrement, No-Increment/Decrement

However, I will live with whatever we get as long as we can access it. This is because we get another 25% cog space which can only be used as variable or stack space provided no clut is required - this is a big increase!! We can also use this for a short LMM type store or overlay space. One thing for sure is we will find lots of ways to exploit it

As for introducing errors, I think (and trust that) Chip & Beau have this well in hand. Chip has proven he is not a risk taker when it comes to silicon. How many other chip manufacturers have no errata in their silicon?

Ps. That GIVE possibility to RUN 3 tasks IN one COG with 3 different usages of THIS MEM. And as we already can run tasks in Prop I it is doable.

Roy Eltham · 2011-05-23 21:48

I got permission from Chip to post the planned changes to the CLUT memory access instructions.

First, there will be two pointers into the CLUT memory, A and B. This allows you to have two stacks. Perhaps one normal one and one expression solver one. You can even have the two stacks build in opposite directions (one at the end building down and the other at the beginning building up.

In the lists below, # is a constant and D is a register.

Here are the instructions to push/pop to/from the CLUT:

PUSHA    D/# - write # via pointer A, and post increment A
PUSHB    D/# - write # via pointer B, and post increment B
PUSHDNA  D/# - pre decrement A, then write # via pointer A
PUSHDNB  D/# - pre decrement B, then write # via pointer B
POPA     D   - pre decrement A, and read via pointer A
POPB     D   - pre decrement B, and read via pointer B
POPUPA   D   - read via pointer A, and post increment A
POPUPB   D   - read via pointer B, and post increment B

Here are the instructions for manipulating the pointers (for add and sub it wraps at 7 bits):

SETSPA   D/#  - write pointer A
SETSPB   D/#  - write pointer B
GETSPA   D    - read pointer A
GETSPB   D    - read pointer B
ADDSPA   D/#  - add to pointer A
ADDSPB   D/#  - add to pointer B
SUBSPA   D/#  - subtract from pointer A
SUBSPB   D/#  - subtract from pointer B

And finally here are the call/return instructions:

CALLA    D/# - write address and C & Z flags to the CLUT at A then increment A, and jump to address
CALLB    D/# - write address and C & Z flags to the CLUT at B then increment B, and jump to address
CALLAD   D/# - same as CALLA, but executes the two instructions after this one
CALLBD   D/# - same as CALLB, but executes the two instructions after this one
RETA         - decrement A then read the value in the CLUT pointed to by A, and jump to that address. 
               if WC and/or WZ are specified then restore those flags before jumping.
RETB         - decrement B then read the value in the CLUT pointed to by B, and jump to that address. 
               if WC and/or WZ are specified then restore those flags before jumping.
RETAD        - same as RETA, but executes the two instructions after this one. 
RETBD        - same as RETB, but executes the two instructions after this one.

Cluso99 · 2011-05-23 21:48

Thanks for the pdf Sapieha. Now I understand what you are saying...

STACK: When using the clut as a stack, only 1 pointer register is required, and it needs to be push (write & post increment) and pop (pre decrement & read).

FIFO: When using the clut as a fifo, it needs two pointers. One for write with post increment and one for read with post increment. Wraparound is also required. (e.g. the FullDuplexSerial object uses two 16 byte fifos. It would be OK to use longs for each byte.)

And you now say that it would be good to be able to set separate versions of these. i.e. 1 stack and 2 fifos.

kuroneko · 2011-05-23 22:08

Roy Eltham wrote: »

Here are the instructions to push/pop to/from the CLUT:

PUSHA    D/# - write # via pointer A, and post increment A
PUSHB    D/# - write # via pointer B, and post increment B
PUSHDNA  D/# - pre decrement A, then write # via pointer A
PUSHDNB  D/# - pre decrement B, then write # via pointer B
POPA     D   - pre decrement A, and read via pointer A
POPB     D   - pre decrement B, and read via pointer B
POPUPA   D   - read via pointer A, and post increment A
POPUPB   D   - read via pointer B, and post increment B

Why this inconsistency (for want of a better word)? I.e. we now get an empty ascending stack and a full descending stack. Not that it matters much (to me), just curious ...

Update: OK, I can see the symmetry at h/w level. No further questions.

Cluso99 · 2011-05-23 22:58

Roy: Thanks for the info.

STACKS: This is great for stacks.

VARIABLES: To use the clut as variable space, we really just need to set the clut pointer and read/write with post inc/dec. While there is no need to do the post inc/dec, it is fine for copying and not an issue if using it randomly as you need to set before each read/write. So this is handled with what you have here.

FIFOS: No help is provided here and it will take more instructions to build a fifo using the clut than it does to use hub or cog ram. Unfortunately I cannot say for sure how much this could/would be used.

Alternative suggestion:
Provide a Mode register for A & B stack pointers as follows (also adds a non-increment/decrement mode) ...

SETSPMA   D/#      - write SP mode register A
SETSPMB   D/#      - write SP mode register B
 
   where the following sets of bits set the increment/decrement mode
00  =  No increment or decrement
01  =  Post increment on read & write (useful for storing or saving a group of longs)
10  =  Pre increment on read & Post decrement on write (stack grows down for normal Push & Pop)
11  =  Pre decrement on read & Post increment on write (stack grows up for reverse direction Push & Pop)
 
   where the following sets of bits set the wrapping level (allows the stack/fifo depth to wrap)
xxxxxxx  = 1 means this bit is used (not wrapped) in the SP register 
(e.g. 0001111 means wraps at 16 longs, the upper 3 bits remain unchanged in the SP pointer register on inc/dec)

Perhaps you could think about this and maybe ask Chip what he thinks?

Cluso99 · 2011-05-23 23:09

Roy: Are your push & pop instructions meant to be this???

PUSHA     D/#  - write # via pointer A, and post [B]DEC[/B]rement A  (stack grows down)
PUSHB     D/#  - write # via pointer B, and post [B]DEC[/B]rement B
PUSH[B]UP[/B]A   D/#  - write # via pointer A, and post [B]INC[/B]rement A   (stack grows up)
PUSH[B]UP[/B]B   D/#  - write # via pointer B, and post [B]INC[/B]rement B
POPA      D     - pre [B]INC[/B]rement A, and read via pointer A
POPB      D     - pre [B]INC[/B]rement B, and read via pointer B
POP[B]DN[/B]A    D     - pre [B]DEC[/B]rement A, and read via pointer A
POP[B]DN[/B]B    D     - pre [B]DEC[/B]rement B, and read via pointer B

I may need to correct my post above, but hopefully you get the idea.

Roy Eltham · 2011-05-23 23:29

Cluso99,
No the description I posted is how it's planned to work. It is intentional. Normal Push and Pop works as push doing a post-increment, and pop doing a pre-decrement.
If you were to set both pointer A and pointer B to 0, and then use PUSHA, POPA at the same time as PUSHDNB, and POPUPB, then the two stacks would grow away from each other (one growing up from 0, and the other growing down from the end of the clut because of the wrap).

Propeller II update - BLOG

Comments