Prop 2 thoughts

darco · 2010-07-07 00:54

Hey guys. It's been a while. Hopefully I'll start posting again more often soon.

I realize that it is almost certainly too late for any hardware features like new instructions to be added to the prop 2 at this point, but I figured it wouldn't hurt to mention a few ideas anyway.

I think one of my most wanted features is a Compare-And-Swap (CAS) assembly instruction for accessing and modifying information in HUB RAM.

I know what you are thinking. Compare and swap? Why would that be useful?

Compare-and-swap is a primitive that you use to build lockless atomic operations. With this instruction (and a few others) you can perform atomic mathematical operations on values in HUB RAM, without using any locks. For example, this would allow you to atomically increment values in HUB RAM, without using locks. In fact, you could implement your own locks using this primitive. (Heck, I would give up the current lock instruction for CAS)

It would be great to have a way to communicate with other cogs that didn't involve the I/O pins or HUB operations. For example, having an internal port which would work just like PORTA, but would only be used for cog-to-cog signaling. I would use it for synchronization, real-time data passing, high-speed cog-to-cog data transfer, etc.

And that reminds me. Another feature that I would love would be a configurable 32-bit shift-register for each cog. The shift register would have a configurable data-in pin and clock pin. The clock would not be dependent on the clock frequency of the propeller, which would allow the propeller to read serial data much faster than currently possible. Think of it as the reverse of the purpose of the video circuitry.

Built-in, Configurable pull-up resistors on all of the I/O pins would be a welcome addition.

Since the HUB RAM is now going to be 32-bit addressable, it would be great if we would have the ability to dedicate a section of I/O pins as an optional memory bus. This would allow us to hook up all sorts of peripherals, as well add more RAM.

In addition to a spin interpreter, it would be great to also have a more c-compiler-friendly bytecode interpreter in ROM as well—something that someone could write a GCC port for without having an aneurism. This would be great for implementing complex systems such as network stacks. Supporting something like LLM in ROM would be great, but it should also have something more compact like spin. Like an LLM 'thumb' mode, or something... With this in place, we could easily port Contiki to the Prop-2!

Lets remove that silly limitation in Spin that you can't add a zero byte into an in-line string. If I want to add a zero into the middle of my string that I am passing into a function I am calling, I should be able to do so. (If this has already been fixed, nevermind... [noparse];)[/noparse] )

I realize I'm rather late to the game on all of these suggestions, but oh well.

Thoughts?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
—
darco
www.deepdarc.com/
[url=mailto:xmpp:darco@deepdarc.com]xmpp:darco@deepdarc.com[/url]

Post Edited (darco) : 7/7/2010 12:59:40 AM GMT

Tubular · 2010-07-07 01:24

You're in luck with at least half of these already. Chip talked about latest features the weekend before last at UPEW. Hopefully there will be a video or transcript or wiki update soon. As I understood it...

1) There is going to be a flexible 32 bit reg used for inter cog comms. I believe each cog has one reg, and any other cog can view it. There is also some kind of OR mask arrangement to check multiple cogs at a glannce
2) There are some new pull up and pull down, as well as current source/sinks programmable on a per pin basis. I'll post the link to the new I/o map when I find it in a moment
3) We're back onto 96 IO pins, and Chip is working on an external SDram interface. Against this the hub ram now looks like 256kB rather than 384. It may be less in the final washup but if we have external SD ram this is not of huge concern

Cluso99 · 2010-07-07 02:00

darco: You really should read the older threads. Many things have already been dealt with.

We are unsure of exactly what the counters will look like but we have asked for them to be able to shift in too. They are way more complex and configurable than the currento ones.

That silly 0 termination I'm afraid is the standard on most platforms (including C ?) - don't expect it to go away.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

darco · 2010-07-07 02:11

You can absolutely insert a properly escaped zero character into an inline string in C—It's Spin that thinks it is smarter than you and explicitly disallows escaping the null character. I think you misunderstood what I was wanting. I already know that any time you use an inline string that it will be zero terminated. That is fine. I want the ability to insert a zero like I would insert any other character that would need escaping.

Regarding checking the older forum posts, I figured checking the wiki would be suitable due diligence and that digging thru the forum would be unnecessary. I didn't mean to spam the board.

edited to sound less aggressive. Sorry about that.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
—
darco
www.deepdarc.com/
[url=mailto:xmpp:darco@deepdarc.com]xmpp:darco@deepdarc.com[/url]

Post Edited (darco) : 7/7/2010 2:23:51 AM GMT

Tubular · 2010-07-07 02:33

darco said...
Regarding checking the older forum posts, I figured checking the wiki would be suitable due diligence and that digging thru the forum would be unnecessary. I didn't mean to spam the board.

Ahh if only. There have been recent changes which means some of the previous information is already wrong. So going back through old threads will not necessarily help. The wiki is generally good, but hasn't been updated with the latest. I believe current state is 96 I/O, 8 cogs, 256kB hub, 128 pin QFP, and up to 24 bit colour at 1080 lines (somehow)

Also I had a really good look for the "pin modes" proposed on the turboprop, that I think Beau posted in bitmap/jpg/png form. I cannot find it, sorry. It showed the various options for each pin, including a few different strength pullups, current sources, ADC fast and ADC precise modes, etc. Cluso, do you know where to find it?

cheers
tubular

hinv · 2010-07-07 02:54

with an sdram, it sounds like the prop2 will be quite the gaming platform.

is there going to be a single cycle mul?

Tubular · 2010-07-07 04:01

found it!
post from chip http://forums.parallax.com/forums/attach.aspx?a=44360

attachment.php?attachmentid=71537

darco · 2010-07-07 16:38

Any thoughts from people about the CAS instruction?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
—
darco
www.deepdarc.com/
[url=mailto:xmpp:darco@deepdarc.com]xmpp:darco@deepdarc.com[/url]

Mike Green · 2010-07-07 16:56

As far as I can see, Compare-And-Swap is just another case of Test-And-Set which is the basic multiprocessor lock mechanism. It's fine, but any of them will do when you need them. The most common multiprocessor synchronization problem is buffer filling / emptying and that can be done without any locks at all.

As with any proposal for the Prop II, the question is mostly how much time and effort (and other resources) it would take and whether a proposed feature would offer significant advantages over one already in the Prop II. I don't see this suggestion as being very high on a list somewhere. There's nothing wrong with it. It's just that there's an existing mechanism that's more than adequate for the task and not used very much at that. The Prop II is much more capable of running out of locks than the Prop I, but at what cost do we add another mechanism and where would the payback be in terms of things that might be doable that could not be done otherwise? I don't know, but that's the proper discussion, not whether it's "good" or not or better than some other proposal.

Invent-O-Doc · 2010-07-07 17:17

I'm amazed that a few people from a small company can design their own microcontroller; particularly an innovative one like the prop/prop2.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Tom Talbot
New Market, MD, USA

Mike Green · 2010-07-07 17:32

Tom,
What's even more amazing is that "a few people" is really one person, Chip Gracey. Several others have contributed to the Prop, but it's mostly the work of one person.

There's a theme that goes through the tech industry where most design work is done by committee and large teams. Historically, the really innovative stuff has always been done by a single individual or a small group, perhaps as large as 3-5 people. If you go through History of Technology literature, you'll find that communications barriers begin to interfere with creativity in any group much larger than that. Lockheed's "Skunk Works" was an attempt to provide that kind of creative environment within a large corporation. Datapoint, a business computer manufacturer in the 1970's and 80's, developed their networked operating system (RMS) that way. Their R&D department was quite small and products were often developed by individuals or very small groups.

kf4ixm · 2010-07-07 21:52

I thought here while back there was a prop 2 'wish list' post, but for the life of me can't find it now, so i guess this is close enough to post an idea...

Does anyone else think that an embedded RTC (real time clock) in the prop 2 would be a good idea? I'm sure someone has already thought of this and it may or may not be feasable.

Any thoughts on why or why not this would be a great addition to the prop 2?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I'm just 7*10²⁷ atoms floating through time and space in close formation. -KF4IXM

darco · 2010-07-07 21:53

Why embed dedicated hardware when you can just write one?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
—
darco
www.deepdarc.com/
[url=mailto:xmpp:darco@deepdarc.com]xmpp:darco@deepdarc.com[/url]

Leon · 2010-07-07 21:58

It would go against the general design philosophy of the Propeller. If that was included, why not a UART, and so on.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Leon Heller
Amateur radio callsign: G1HSM

BigFoot · 2010-07-08 01:06

It is so easy to use a cog as an UART why would you want to put one on the chip ?

I have to admit that a RTC would be nice though.

Russ [noparse]:)[/noparse]...

localroger · 2010-07-08 01:35

RTC's are fairly specialized. You need a clock that can stay hyper-accurate and keep updating its internal counters on microamps from a backup battery. The Prop can do the microamps thing but making it do the hyper-accurate clock thing on a microamp budget would probably screw up the rest of its clockiness, especially the ability to PLL up to 80 MHz from a 5 MHz resonator when NOT in microamp mode.

BigFoot · 2010-07-08 01:42

Good point, I wonder what the Propeller 2's idle current will be.

Russ

Leon · 2010-07-08 07:43

Rather high.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Leon Heller
Amateur radio callsign: G1HSM

jazzed · 2010-07-08 18:00

Maybe Chip can talk about this if he has time:

There is apparently an SDRAM mode (a fixed peripheral) judging by the clock out and data I/O entries in the table. I'm curious exactly how the SDRAM clock would be activated and how those entries are used. Seems like one or more separate PASM instructions could be used for this.

Assuming byte-wide only access for the moment, the data pins should be inputs or outputs depending on the transaction type. If a rdxbyte instruction was available, the data I/O lines could be atomically put to input state saving one or more PASM instructions. A wrxbyte could set data I/O to outputs. Also such instructions could automatically assert the read or write lines if there was a write out pin assignment similar to the clock out saving more instructions. Also, having the clock out pin be optional would be good for cases where SRAM might be appropriate.

For variants other than rdxbyte/wrxbyte such as rdxword, rdxlong, rdxquad, rdxhextuple etc... the clock out could be strobed by 1, 2, 4, 8, or 16 according to the SDRAM burst requirements. I also wonder if there might be a way to transfer data directly to/from HUB for buffer operations.

We are all used to rdlong and friends by now, so a rdxlong should be easy to digest. Such instructions could take multiple ticks because of access timing and the number of bytes to read.

Much of this flies in the face of Propeller flexibility as we know it, but if it can be future proofed with the most common operations, a great speed advantage can be realized and would give other micros with such bus access a run for their money.

Cheers,
--Steve

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM

Cluso99 · 2010-07-08 23:42

jazzed: Did you recall that Chip has said there will be 512 longs of fifo for each cog? This is going to aid some things depending on how it works. Unfortunately it cannot be paged into cog memory, presuming because it does not have 4-way access as the cog memory does.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

Bill Henning · 2010-07-08 23:59

In my recollection, it is 128 longs, used as a FIFO or LUT.

Given how Chip now plans to support 24 bit color, I would not be surprised if it ends up being 256 longs (for 256x 24 bit LUT entries)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

jazzed · 2010-07-09 02:38

A FIFO/LUT would be useful.

Perhaps my specific questions about the SDRAM interface are still worth answering.

Thanks,
--Steve

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM

Cluso99 · 2010-07-09 04:15

Bill: I certainly could be wrong, but I remember asking if we could bank switch it. However, I believe it has only a single (maybe dual) access where the cog memory is quad. Anyway, we will see.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

cgracey · 2010-07-10 04:27

per Steve's SDRAM ideas...

There will be SDRAM support on the pins. SDRAM is too complicated to talk to via RDxxxx/WRxxxx instructions,·though a cog can be programmed as a server to the other cogs. SDRAM is like a freight train. You have to turn it around to change memory areas, but once turned, it can go·fast. In a PC, this is mitigated by on-CPU caches which regulate data flow. Since we don't have caches, we will talk most-efficiently with an SDRAM by serving blocks of data upon request, into and out of hub RAM, for use by other cogs. It's shaping up that we will be able to transfer 16 bits per clock between the pins and hub RAM, using a 16-bit-wide SDRAM. This means bandwidth approaching 320MBytes/second. The bigger the packet size, the closer to 320MB/s.

If you want to do 24-bits-per-pixel 1080p HDTV signalling, you'll need to have 32 bits of SDRAM width to bump up the cog/SDRAM bandwidth to 640MB/s. You still can't get through to hub RAM any faster than 320MB/s, though.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 7/10/2010 4:32:15 AM GMT

Cluso99 · 2010-07-10 07:29

Wow! 320MB/s is pretty impressive for a little chip. The SDRAM interface will give us some great options here Chip

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

heater · 2010-07-10 07:41

Impressive indeed.

So what is an optimum block size for SD RAM?

I'm sure Bill Henning will want to add a driver option for such SDRAM interface to his VMCog which will provide the missing cache for apps that want single BYTE/WORD/LONG at a time access to big a memory space. VMCog currently deals in 512 byte "pages".

This will be so great for Catalina, Big Spin, Zog, all those emulators etc etc.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-07-10 16:28

You are entirely correct.

VMCOG2 will indeed support SDRAM on Prop2. Guaranteed.

heater said...
Impressive indeed.

So what is an optimum block size for SD RAM?

I'm sure Bill Henning will want to add a driver option for such SDRAM interface to his VMCog which will provide the missing cache for apps that want single BYTE/WORD/LONG at a time access to big a memory space. VMCog currently deals in 512 byte "pages".

This will be so great for Catalina, Big Spin, Zog, all those emulators etc etc.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

BigFoot · 2010-07-10 16:40

This is really good news and it will solve so many of our problems, with swapping programs
and handling a large number of bitmaps.

Russ

jmg · 2010-08-10 17:14

Cluso99 wrote: »

We are unsure of exactly what the counters will look like but we have asked for them to be able to shift in too. They are way more complex and configurable than the current ones.

Is there any info on what the Counters in Prop2 will look like ?
[XMOS have some nice shift features, but lack counters, esp on input/capture]

Pet peeves from evaluating other controllers :

* Counters that have lower resolution that register size
(Surely a 32 bit counter is tiny in silicon, why mess with 16 bit ones )

* Counters that cannot do reciprocal frequency counting (RFC)
( Amazing how common this problem is.
To handle RFC is not complex, but you DO need two counters to properly interconnect.
One captures a fastest-possible-timebase on an edge, and the other
divides an input pin (fu), by a fixed sample rate amount, to give that edge.
Commonly this is scaled to ~ 10 updates / second, but can by 1000
That divider can be binary, so a simple tapped-divider, can allow faster Fu rates.
Some chips lack the simple interconnect, needing external pin-strap.

* Counters that cannot do Quadrature counting
( Again, the silicon cost has to be very low..)

* Extra points go to a chip, that allows Pin rates above Core Freq

* Some controllers allow fractional clock precision, on PWM applications.
ie with ~100MHz core rates, they have HW extensions to allow 1ns or sub ns edge
precisions. I have not yet seen that feature extended to capture.

Other peeves:
* PLLs that come up short. Once you have a high frequency VCO and a control loop, why do so many have such limited divider options.
It does NOT need many bits on P,M,N for Fs = (Fvco/P)*M/N to give fine enough resolution, to greatly slash the crystals needed.
Best example device (ie best PLL control ) is the LPC17xx family.

* Multi-thread cores, that lack a Zero overhead loop opcode/structure.
DSPs have had these for years, and on small, very tight loops, the Loop counter can waste a significant % of bandwidth. Software should NOT need to decrement and test every loop, as that's an ideal hardware task.

I'm interested in how many of these 'peeves' are addressed in the Prop2 ?

Leon · 2010-08-10 17:48

Where did you get the idea that XMOS chips don't have counters? From the XMOS documentation:

"Port counters are 16-bit counters clocked by either an external clock, or by a divided internal reference clock. Port counters are guaranteed to perform I/O operations at precisely defined moments related to an externally visible clock signal."

jmg · 2010-08-10 18:13

Leon wrote: »

Where did you get the idea that XMOS chips don't have counters? From the XMOS documentation:

"Port counters are 16-bit counters clocked by either an external clock, or by a divided internal reference clock. Port counters are guaranteed to perform I/O operations at precisely defined moments related to an externally visible clock signal."

Yes, but notice that is a subset of what most would call a uC Counter.

They are only 16 bits, and are there for timed-out-tasks, and do not support capture, or PWM, or prescaling, or cross connection. Things get messy real fast.

I think the intention is a SW thread does 'everything else', but the SW threads have poor capture time-resolution.

Prop 2 thoughts

Comments