What would you want more of, cogs or RAM?

stevenmess2004 · 2008-04-19 22:38

Parallax have indicated many times that they want to keep all the cogs the same so having cogs dedicated to this type of thing is unlikely. However, if they fix the video hardware so that it can shift in as well as shift out than you would be able to do very fast transfers between props or to external memory. Also, with 64 io pins you will be able to afford to use more pins for the interface.

The cogs only have 512 longs of ram. This only allows for fairly short programs but does the architecture often makes up for this. The LMM is an idea proposed by Bill Henning to allow longer asm programs by getting instructions from the hub using a simple loop. There is a good explanation on the propeller wiki here propeller.wikispaces.com/Large+Memory+Model.

I very much doubt that they will expand the size of the cogs memory as this will require a change in the instruction set or wider op-codes (how about a 64 bit prop

).

Edit: Mike beat me

hippy · 2008-04-19 23:22

Extending Cog RAM would be possible using a banked memory scheme but Parallax have given no intention of wanting to follow that route and LMM is as good or better in most cases. A banked scheme requires each Cog to have its memory increased, whether utilised or not ( with an increase in silicon footprint ) while LMM allows each Cog to effectively extend their memories as and if they choose to.

hinv · 2008-04-19 23:39

Is it their goal to keep all of the i/o's the same? Before I have asked for some of the I/O's to run at the cores 1.8Volts for ease of attaching memory and prop to prop communications, which I presume would be lower power and higher speed than 3.3V. If the idea is to keep all of the i/o pins the same as well as the cogs, maybe there is a way to switch on/off the step-up to 3.3V? I too would like to see a shiftin on the video hardware as well as it would aid in communications.
I would also like to see another 5 pages in this thread!

Mike Green · 2008-04-19 23:51

If you read the rest of this thread (long as it may be), you will see some of the reasons for these decisions. If you want to work with 1.8V devices, you can switch between low output and input mode with a pullup to 1.8V. There's very little advantage to making some I/O pins work at 3.3V and others at 1.8V. There's no "step-up" to 3.3V. The output transistors are made differently (to withstand 3.3V) and connected to the 3.3V power bus.

hinv · 2008-04-20 02:15

back on page 11 http://forums.parallax.com/showthread.php?p=617536 I requested 1.8V on some of the I/O lines because Chip stated that there was an advantage in LVDS transition times. You said "there is very little advantage", can you explain? Why wouldn't it be a great deal better for high speed signals, memory interfacing, prop2prop communications? Do you have some insight as to how fast the video circuits/counters will run?

Thanks,
Doug

woodrowg · 2008-04-20 02:23

Re: High speed Serial Port
OK,forget the dedicated COG idea. Don't put the serial interface in a COG. My original suggestion was make it part of the HUB, just build it and give it two or three pins. If you need it, turn it on. If not, route the I/O pins for some other function.

To Mike Green: I don't understand why one would chose to restrict HUB RAM size until you reach the end of a 32 bit address space? Therefore, to gain flexibility and index larger arrays, why not increase the allocated real estate for HUB RAM, particularly if one has taken the decision to port a C compiler.

What is wrong with 16k x 32 (or 32k x 32) bits of on chip RAM?

I think I was puzzled by the statement in the ImageCraft readme file that the compiled code would execute much faster than Spin code? It may be true, but if you can't keep the instruction pipeline fed, it really does not translate into meaningful performance.
wg

stevenmess2004 · 2008-04-20 02:28

Just thought of something, the current prop does kind of have an intercog communication mechanism although it is very limited and can't really be used for much but it may be able to be extended in the next prop.

When using the video generator you can select a timer from another cog to use for the audio modulation. Maybe, this could be extended to passing data between cogs. However, it would still be slower than using the hub because it would take 32 cycles to transfer a long using the timers and only 16 cycles using the hub.

cgracey · 2008-04-20 02:31

About this 1.8V vs. 3.3V I/O matter on the next Propeller:

Each set of 8 I/O pins will have their own VIO pin (VDD for I/O). The pins can do·lots of things·at VIO=3.3V, but can still function well as digital-only I/O's at VIO=1.8V. At 1.8V, there·are lower slew rates and drive strengths for outputs, but the·digital input threshold·is still around VIO/2, or 0.9V.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Mike Green · 2008-04-20 02:42

woodrowg,
The Prop II is already expected to have 256K bytes of on-chip hub RAM (64K x 32 bits) plus 2K bytes (512 x 32 bits) for each of 16 cogs = 32K bytes.

Why do you think the pipeline might not be fed adequately?

hinv · 2008-04-20 03:52

This sounds great for easy interfacing to 1.8V devices. Do you mean by lower slew rate (usually measured in volts per microsecond) that it is as slow or slower than a 3.3V output, or can you drive them faster, in terms of bit rate?
What is the projected top end for VIO? If, by chance it is 5V, that could eliminate a lot of external resisters when interfacing to 5V devices.

Mike Green · 2008-04-20 04:13

Chip has made it clear that the manufacturing process is not capable of producing devices that can withstand a 5V input. I suspect the absolute maximum VIO is around 4V with device damage occurring around there.

Slew rate affects the noise sensitivity of a signal. A lower slew rate means that there's less ringing at a given data rate, less distortion of the pulses. You can have a higher data rate on a given type of interconnection (wiring), that there's less radiated noise. It does mean you can drive a given type of transmission line faster or use a longer line with the same reliability. It won't affect the bit rate possible with the chip since that will be limited by the chip's clock and instruction rate, not the maximum data rate of the I/O drivers.

Consider the example of RS232 or RS485 drivers with slew rate control. For a given type of communications cabling, you can indeed get longer range or higher data rates if you control the slew rate. If you're going to run long PCB traces or run signals from board to board at high speeds, you will get fewer errors and have less need for expensive interconnect cables with tightly controlled impedances if you use lower slew rates (like 1.8V LVDS).

hinv · 2008-04-20 04:30

Thanks mike for the explanation. Thanks Chip for 1.8V capable I/O. I am really impressed.

davran · 2008-04-20 14:55

İ exposed some rf between (10 to the power of 8-9hz ) over prop and i achieved it to fail

the COP micros of national semi. has emi reduction ( presumablythey have the patent of it).

The next prop may have the emi reduction??

Peter Jakacki i am sory if i agitated you. whatever as you stated i am a prop fellow. the link of your website really impressed me.

and i just realized that you won the 2nd prize.. congratulations, what is the price can you send ?

by the way my code has problem ;

SHL WORDDATA ,#10 WC THE CARRY flag doesnt mirror the 10th bit. (from left)

it mirrors the first bit (msb), i guess it is feature of spin, if so how can i get the 10th bit IN one instruction cycle?.

the post of Ken is really the story what i lived a couple of weeks ago !

ken said...
...
I work in product development for an automotive supplier. I have used the Propeller on several occasions to very quickly put together concepts that management is impressed with. One such product was an MP3 player with touch screen interface. But once they learn what I used for a processor, they say "You can't use that! It's not automotive approved! You can't program it in C! (you can soon!) And of course-it costs too much, we can get 32 bit processors for under $5! But all of this does not change the fact that I was able to put together something that would have taken a lot more time and money to develop by other means. So even though we might not use this processor in our final production design, it was great for throwing together a concept quickly. For me, this is one thing that the Prop excels at.

I think the direction for the PropII is right on the money. Why have dedicated hardware when you can emulate it with software just as effectively? Everything I have wished the Prop had is planned for the Prop II, and I'm excited and looking forward to the day becomes available.

Love it or hate it, the Prop is a different animal. Use it or don't use it, but don't wish it were something it's not.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"Give a lie twenty-four hours start, and it will take a hundred years to overtake it.
" (C.F. Dixon-Johnson, British author of the 1916 book, "The Armenians," appalled over the deceitful practices of his book's subject.)

www.tallarmeniantale.com/

Post Edited (davran) : 4/20/2008 5:57:02 PM GMT

Sapieha · 2008-04-20 15:55

Hi All.

About more memory and Serial IO I have mentioned in other threds.
My constructions have not interference with Propellers architecture but I have not response from Paralax.

Regards

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

hippy · 2008-04-20 16:37

davran said...
by the way my code has problem ;

SHL WORDDATA ,#10 WC THE CARRY flag doesnt mirror the 10th bit.

it mirrors the first bit (msb), i guess it is feature of spin, if so how can i get the 10th bit.

That's a feature of Propeller instruction set ( although to get bit 10 needs a shift right ). Two alternatives, shift 9 then shift 1 ( or similar sequences ) or 'test' against a bit mask ...

        shr     worddata, #9
        shr     worddata, #1 WC

        test    worddata, k_bit10 WC
        :
k_bit10 long    |< 10

It would be better to post programming questions in a new thread as they are off-topic to this one.

ImageCraft · 2008-04-20 23:19

woodrowg said...
...
To Mike Green: I don't understand why one would chose to restrict HUB RAM size until you reach the end of a 32 bit address space? Therefore, to gain flexibility and index larger arrays, why not increase the allocated real estate for HUB RAM, particularly if one has taken the decision to port a C compiler.

What is wrong with 16k x 32 (or 32k x 32) bits of on chip RAM?

I think I was puzzled by the statement in the ImageCraft readme file that the compiled code would execute much faster than Spin code? It may be true, but if you can't keep the instruction pipeline fed, it really does not translate into meaningful performance.
wg

Woodrowg, I think there is some fundamental misunderstanding going on. If I read your post right, let me just assure you that ICC is not placing any unnecessary restriction on the use of COG or HUB RAM. If you are referring to why the current Propeller only has 32K of RAM, well, I am sure Parallax has good (internal) explanation on why they make certain decisions.

I am also unclear what does the speed of Spin vs. C, and the instruction pipeline comparison comes in. Spin is a bytecode interpreter. It's said that on average it's about 40x slower than native COG code. ICC uses a LMM, so most instructions are native Propeller instruction, except that they live in HUB RAM and have to be fetched into the COG RAM, then executed. This carries a penalty of anywhere from 10x slower to 5x slower than native COG code.

And while you probably are aware of this, but for the completeness of the discussion, the reason generally high level languages do not generate native COG code is because programs in COG is limited to 2K bytes (512 longs) in size. This is an architectural limitation and can only be gotten around by either using an interpreter (Spin) or a LMM kernel (ICC and probably some others like Forth etc.)

KyroMaster · 2008-07-19 15:43

I don't know whether this is possible at all, but could future propeller chips support floating-point calculations with an FPU, also supporting things like hardware division?
Then it could be used for DSP applications as well as audio decoding,MP3,advanced 3D graphics etc. The propeller architecture IMHO has lots of CPU power and many cores, if proper (floating-point) calculation was possible it could extend its abilities even more. For DPS usage there could be a simple A/D converter (and possibly an D/A-converter,too) like most other microcontrollers already have (AVR/ARM).
Of course just an idea, will any of these features be in future propeller releases?

Mike Green · 2008-07-19 15:55

There's no plan to build in floating point or hardware division. There will be a hardware multiplier in Propeller II. There's just not enough need for floating point for it to be built-in as hardware. There's a very well written and reasonably fast software floating point library already available. It could be rewritten to use the hardware multiplier and possibly included in the ROM in a "release 2" Propeller II. Remember that code doesn't run from ROM. It has to be copied to cog RAM to execute. The stated plans are to include the current "stuff" (bootloader, updated Spin interpreter, font tables, transcendental tables) in the ROM for the first release, then add code to make the Propeller self-supporting like a simple editor and compiler/assembler for a later release. This would be masked ROM, so "first release" parts could not be updated.

Note that DSP-type applications would normally use scaled integer arithmetic anyway rather than floating point. It's much faster even when completely done in hardware. Have a look at the voice synthesis programs that Chip did. They implement generalized voice synthesis with the ability to set the tone and position dynamically each voice of several along a stereo axis.

Post Edited (Mike Green) : 7/19/2008 4:01:21 PM GMT

Mike Huselton · 2008-07-19 16:22

And then there is always a FPU co-processor like the www.parallax.com/Store/Components/AllIntegratedCircuits/tabid/154/CategoryID/82/List/0/SortField/0/Level/a/ProductID/401/Default.aspx to offload complex floating and string manipulation while the Prop does other things.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

DNAMusic · 2008-07-21 05:17

Mo' Memry' Please Sir!

waltc · 2008-07-21 06:10

1) 8-16k of RAM per cog.

2) A 32bit external memory bus or at least a variant with it.

3) Get rid of the built in SPIN and move it to a EEPROM or just turn it into a compiled language like Occam and integrate it into Eclipse. Having a built in development environment IMO only made sense when computing resources were limited to a dumb terminal and a RS-232 connection. With a powerhouse like the Prop and todays pc's, I don't see the use of it.

My 2 cents.

Brian L · 2008-07-21 07:01

I've also been wondering about why there's going to be an on-chip development environment included. Nowadays even homeless people have a PC, so what's the point? Why would a person really need this?

I'm not knocking the idea. I just figured there must be some reason I just can't think of.

Cluso99 · 2008-07-21 08:59

Built in spin only takes 2K bytes in the Prop I.

There is probably plenty of ROM space in the PropII to put in an IDE and no-one is forcing you·to use it (I won't). However,·I think it probably has huge advantages for the school education market. All you need is a monitor, keyboard and mouse - real cheap second hand ones these days!!! And if the protoboard has a SD card slot, you have everything a school-kid would want. Actually, the more I think about it, the more I like it.

As for more memory in the cog - yes it would be nice but the instruction set almost precludes it because of 9 bit addressing unless you complicate the prop with bank switching. Lots of hub ram and 16 cogs will make a huge difference. Hopefully there will be some improvements with hub access - Chip has hinted at this. I think he mentioned single cycle hub access, so 16 cogs·will still be 16 clock round robbin and there was a hint of looking at higher priority access.

A fast single assembler write instruction for a block move (variable length)·from hub to ram would be real nice

With 16 cogs, off chip memory could be managed in a single cog.

OK, when can I have one???

Praxis · 2008-07-21 10:13

My thoughts,

Each cog (regardless of the final count) has a shift register that allows the input, output and clock to be connected to the IO pins.

Clock selection for master/slave operation i.e. master clock out, slave clock in.

A clock divider. for the shift register (master mode)

hal2000 · 2008-07-21 13:32

I vote
Option·2: 8 cogs with 256KB of hub RAM. Hub access once every 8 clocks

and spin compiled
not interpreted

I think it is late to my comment
;-)

Envio editado por (hal2000) : 7/21/2008 1:38:19 PM GMT

evanh · 2008-07-21 14:35

I too was initially wary about the use of on-chip mask ROM but after reviewing the die image I realised how big the feature diffs were and how sensible it is to put some common data in there. Here's one image I found - Propeller die image

The mask ROM is located in the eight square blocks lined up across the middle left and middle right, making 32kByte. Compare that with the eight SRAM blocks filling the bottom third of the die, also making 32kBytes. It's about an 8:1 ratio.

Evan

Ale · 2008-07-21 14:35

hal2000,

I think the voting closed like one year ago, and won 16 cogs, and 256 KB RAM. I think the original thought was 16/128 or 8/256, but a later rectification to 16/256 was done. Even better

(See table at propeller.wikispaces.org) All instructions are going to be single cycle, real fast SPI...

Cluso99 · 2008-07-21 15:34

I didn't explain it enough - I believe Chip asked what was thought about giving some cogs preference to hub access (programmable) like 1:16, 1:8, 1:4. I am not sure what happened - guess we will wait and see because I understand they are just waiting for validation software to test before committing to silicon.

Timmoore · 2008-07-21 16:16

@Ale, you can currently run SPI at 10Mhz clk, there are not many SPI chips that run that fast. I did a RAM SPI driver using a counter as a clock. The only instructions you need are input/output so for 1 bit spi, you need rotate into c, muxc for output and similar for input (test and rotate), if you unwind the loop the necessary number of times you get 2 instructions per bit, i.e. 10Mhz. Then you setup a counter to generate the sck interleaved correctly into the read/write code. Take a look in the SRAM expansion thread to see how its done. That code is for 4 bit wide sram so it has and/or instructions but the adaption for 1 bit is straight forward

Phil Pilgrim (PhiPi) · 2008-07-21 17:35

Ugh! The thread that just won't die! (I wonder if the forum software will break after 65535 views.)

Seriously, this topic's occasional spurts of life may give false encouragement to those who imagine there's still time to influence the Prop II's design. OTOH, the thread is a convenient common bucket into which the entrails of vain hope, unrequited desire, rampant rumor, and idle speculation can be tossed. So I guess it still serves a purpose, despite its overripeness.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

Post Edited (Phil Pilgrim (PhiPi)) : 7/21/2008 6:19:06 PM GMT

What would you want more of, cogs or RAM?

Comments