Propeller II update - BLOG

potatohead · 2014-01-09 16:08

Wow! Very good news.

You mentioned snippets going away. Now we get simple inline PASM?

Having SPIN in ROM means one SPIN, but it also means less HUB RAM. That said, I am an advocate of a ROM SPIN, because it would be like SPIN on P1 which is consistent.

David Betz · 2014-01-09 16:11

potatohead wrote: »

Wow! Very good news.

You mentioned snippets going away. Now we get simple inline PASM?

Having SPIN in ROM means one SPIN, but it also means less HUB RAM. That said, I am an advocate of a ROM SPIN, because it would be like SPIN on P1 which is consistent.

The problem is, if you add more ROM you take away RAM the way things are implemented now. The ROM is nothing but hardwired RAM cells.

potatohead · 2014-01-09 16:13

Yes, I get that and said as much. To me having many SPIN variants is worse than a small amount less HUB RAM.

jmg · 2014-01-09 16:21

MJB wrote: »

@CHIP
In Automated Testing Systems we often have the need to measure isolated voltages.
This can easily be done with external Sigma/Delta encoders like http://www.analog.com/AD7401A
This part delivers a 10MHz bitstream which gives a 16bit ADC when filtered by a SINC3 filter
(verilog code shown in datasheet). But then need an FPGA to decode it, which is high cost.

There are a growing number of these types of Serial-stream ADCs

MJB wrote: »

So I was wondering, if the P2 ADC HW might be used to take the EXTERNAL bitstream, instead of the internal one
to implement an isolated ADC.
IIRC the internal ADC works with 1st order S/D-encoder whereas this SINC3 gives MUCH better Signal to noise ratio.
Looking at the verilog code it might even be possible to use one COG @200MHz to do it in SW.
but some HW assist, if available, would be much better.

At 10MHz, using just a counter (sinc 1) you can get 16 bits at ~152 Hz update rates. (ie P1 will do this)

The FPGA code seems to sum bits, so I think the HW assist could be a Counter, in External count-enable mode, sampled at some lower rate. The adds can now be at nibble or byte rates, or whatever slower rate is needed for the P2 SW to do the 3 adds and 3 subtracts and some shifts shown.

Such ADCs can generate a CLK, or receive a CLK - Hopefully, the expanded P2 counters will support both modes, and be able to 'collect' ADC streams from such external sources.

Total channels would be limited to the counter channels, minus 1 for CLK generate (master clk case), or = Ctr Chans, if each Ctr has a pin-pair of CLK/DAT feeding it.
Did Chip mention 4 counters per COG now, in some modes ?

ctwardell · 2014-01-09 16:21

potatohead wrote: »

Yes, I get that and said as much. To me having many SPIN variants is worse than a small amount less HUB RAM.

The problem with putting it in ROM is it puts it on the critical path for the shuttle run. I also would prefer to not lose any RAM.

C.W.

SRLM · 2014-01-09 16:25

cgracey wrote: »

Yes. That's going to be important to Spin - to be able to execute PASM and even background PASM in other tasks of the same cog.

One of my favorite features of PropGCC is inline ASM.

David Betz · 2014-01-09 16:31

potatohead wrote: »

Yes, I get that and said as much. To me having many SPIN variants is worse than a small amount less HUB RAM.

Yeah, I guess you did. I misread your post. I thought you wanted to save hub RAM by putting Spin in ROM. In any case, I think there is a big advantage to having Spin in RAM. You can fix bugs and improve it. I wonder how much better Spin would be now if we hadn't been stuck with the same VM for 8 years? I'm sure Chip could have come up with some improvements!

David Betz · 2014-01-09 16:33

cgracey wrote: »

Yes. That's going to be important to Spin - to be able to execute PASM and even background PASM in other tasks of the same cog.

Any chance you'll change the Spin compiler to generate native code? That would be a real performance boost!

jmg · 2014-01-09 16:33

potatohead wrote: »

Having SPIN in ROM means one SPIN, but it also means less HUB RAM. That said, I am an advocate of a ROM SPIN, because it would be like SPIN on P1 which is consistent.

Problem is, on P2, ROM is the same silicon cost as RAM.
So you want to keep ROM to a minimum.

RAM-Spin has many advantages, and I would not worry about more than one spin - if/when Spin becomes adaptive, then every project will have a different spin at the binary level (but, same Spin at Source level ) - unused stuff is left out, allowing some user libraries to effectively become native spin calls. You simply cannot do that with ROM spin.

RAM-Spin also allows Parallax to release a P1-cloned version of Spin, that can even be time-compatible and a P2 Spin, that can run faster and have more powerful functions.
There is no reason both Spin's could not co-exist in one P2, in separate COGS.

David Betz · 2014-01-09 16:37

jmg wrote: »

Problem is, on P2, ROM is the same silicon cost as RAM.
So you want to keep ROM to a minimum.

RAM-Spin has many advantages, and I would not worry about more than one spin - if/when Spin becomes adaptive, then every project will have a different spin at the binary level (but, same Spin at Source level ) - unused stuff is left out, allowing some user libraries to effectively become native spin calls. You simply cannot do that with ROM spin.

RAM-Spin also allows Parallax to release a P1-cloned version of Spin, that can even be time-compatible and a P2 Spin, that can run faster and have more powerful functions.
There is no reason both Spin's could not co-exist in one P2, in separate COGS.

I hope that Spin2 ends up being a strict superset of Spin. I understand that PASM2 won't even be source compatible with PASM but it should be possible for Spin2 to be a superset of Spin.

jmg · 2014-01-09 16:47

David Betz wrote: »

I hope that Spin2 ends up being a strict superset of Spin. I understand that PASM2 won't even be source compatible with PASM but it should be possible for Spin2 to be a superset of Spin.

strict superset can mean many things - besides code-compatible, there is also time-compatible, and you don't really want Spin2 to be always crippled to Spin1 speeds; instead you want that to be an option.

Also PASM2 might not be 'native' source compatible with PASM, but it should be possible to use macros to have a P2 assembler, accept P1 Assembler source - even it it sometimes creates a slightly larger binary image.

David Betz · 2014-01-09 17:44

jmg wrote: »

strict superset can mean many things - besides code-compatible, there is also time-compatible, and you don't really want Spin2 to be always crippled to Spin1 speeds; instead you want that to be an option.

Also PASM2 might not be 'native' source compatible with PASM, but it should be possible to use macros to have a P2 assembler, accept P1 Assembler source - even it it sometimes creates a slightly larger binary image.

I wouldn't want Spin2 to be constrained to peform at Spin speeds. I was just talking about source code compatability.

Cluso99 · 2014-01-09 18:09

Chip, this is fantastic news (hubexec mode hw being simplified)!

Spin2 will be much faster running from hub where the memory constraints will permit spin to be much faster. Yet another boost!

cgracey · 2014-01-09 18:23

David Betz wrote: »

Any chance you'll change the Spin compiler to generate native code? That would be a real performance boost!

I've pondered that. You could instantiate objects you want to run quickly with some extra keyword to cause them to compile to PASM, as opposed to Spin byte codes. Or, you could force compilation within in an object. By providing for both possibilities, the user could trade density for speed, where required.

Sapieha · 2014-01-09 18:32

Hi Chip.

Very good news.

David Betz · 2014-01-09 18:42

cgracey wrote: »

I've pondered that. You could instantiate objects you want to run quickly with some extra keyword to cause them to compile to PASM, as opposed to Spin byte codes. Or, you could force compilation within in an object. By providing for both possibilities, the user could trade density for speed, where required.

That sounds excellent! I've wanted to be able to combine CMM and LMM modes of PropGCC but there isn't enough space in the COG for both. Now with hub execution it might be possible for PropGCC to do what you're suggesting as well. Great idea!

jmg · 2014-01-09 18:42

David Betz wrote: »

I wouldn't want Spin2 to be constrained to peform at Spin speeds. I was just talking about source code compatability.

but once you offer that, users expect code to run as compiled - ie work as it did before.

They will take working code, compile it, get no errors, and expect it to run.

That means providing a speed option switch for P2, to make it timing compatible. (as much as practical)

David Betz · 2014-01-09 18:44

jmg wrote: »

but once you offer that, users expect code to run as compiled - ie work as it did before.

They will take working code, compile it, get no errors, and expect it to run.

That means providing a speed option switch for P2, to make it timing compatible. (as much as practical)

Doesn't most Spin code that needs to be cycle accurate use waitcnt? That code should still work fine.

jmg · 2014-01-09 19:05

David Betz wrote: »

Doesn't most Spin code that needs to be cycle accurate use waitcnt? That code should still work fine.

If the user codes that way, yes, I would expect it to operate ok, well at least at the 80MHz clock levels.

For best backward compatible modes, a choice to run CNT inc's at 80MHz would be needed to be 'source compatible'
That should be the default and for 160MHz CNT inc's at half of Fsys should be an option.
.

Others simple use a FOR loop as a delay element, or just lines of code, and that code would not work the same on faster spin.

David Betz · 2014-01-09 19:09

jmg wrote: »

If the user codes that way, yes, I would expect it to operate ok, well at least at the 80MHz clock levels.

For best backward compatible modes, a choice to run CNT inc's at 80MHz would be needed to be 'source compatible'
That should be the default and for 160MHz CNT inc's at half of Fsys should be an option.
.

Others simple use a FOR loop as a delay element, or just lines of code, and that code would not work the same on faster spin.

To be honest, I would never expect timing critical code to run without modification on a new version of a processor. It is not unreasonable to expect to have to tweak thatl code or rewrite it for P2. However, most applications will probably have a fair amount of code that is not timing critical. That should run without modification.

Yanomani · 2014-01-09 19:19

Chip

IMHO, thinking further, it's also advisable to craft some library identifier, to easy a possible recollection function to scan for multiple instances of hub-resident PASM routines, generated by many SPIN programs, running on multiple Cogs. Then, before closing a project, a final cleanup pass could "shrink" the whole code base, allowing for a bit of Hub memory savings to be attained.
I'm sure aware that this feature isn't trivial at all, because some engage/disengage mechanism must be crafted, in order to keep track of the number of processes that required for a specific function to be present inside Hub ram, at a given time.
Since it's kind of an overlay function load/release controller, it should behave as some kind of "system call", then I'm not totaly sure that it will fit into the "spirit" of SPIN language.
Things could also get complicated by the fact that many of the SPIN running programs, on different COGs, could be gathering their next functions from some external device, such as an Eprom or SD card, or relying on another Cog to do that job for them, but in this case, the loader Cog could handle the library check function and react accordingly.

Yanomani

jmg · 2014-01-09 19:38

David Betz wrote: »

To be honest, I would never expect timing critical code to run without modification on a new version of a processor. It is not unreasonable to expect to have to tweak thatl code or rewrite it for P2. However, most applications will probably have a fair amount of code that is not timing critical. That should run without modification.

It can be done, and it is done - One example - take a look at the Atmel AT89LP51 series.

This has a core nominally 12x faster than the devices it replaces, but it has choices of CORE and TIMER prescalers, to give either fully-slow aka compatible operation, or you can speed the core, and keep the timers/uarts etc running at same-code speeds, or you can run everything at x12 speeds.

eg Something as simple as a prescaler-choice in P2, would allow at least all the 'proper' code using a CNT, to still run with a core fsys of 160MHz.
In P2 that 160MHz may be dictated by new code, in one COG, whilst the user (of course) prefers to run 'std' code, on other COGs.

It may even be practical to have a COG /2, which would skip every 2nd CLK, and save power, but HUB access would still need to be active in that COGs time-window, even if 50% of the time.

Bill Henning · 2014-01-09 20:09

Split the Spin cog into two threads, and have the spin thread only get 1/16 (or a few more) slices.

jmg wrote: »

It can be done, and it is done - One example - take a look at the Atmel AT89LP51 series.

This has a core nominally 12x faster than the devices it replaces, but it has choices of CORE and TIMER prescalers, to give either fully-slow aka compatible operation, or you can speed the core, and keep the timers/uarts etc running at same-code speeds, or you can run everything at x12 speeds.

eg Something as simple as a prescaler-choice in P2, would allow at least all the 'proper' code using a CNT, to still run with a core fsys of 160MHz.
In P2 that 160MHz may be dictated by new code, in one COG, whilst the user (of course) prefers to run 'std' code, on other COGs.

It may even be practical to have a COG /2, which would skip every 2nd CLK, and save power, but HUB access would still need to be active in that COGs time-window, even if 50% of the time.

jmg · 2014-01-09 20:17

Bill Henning wrote: »

Split the Spin cog into two threads, and have the spin thread only get 1/16 (or a few more) slices.

Sounds good, at 160MHz, to target a 20MHz P1 rate, that would be 8:1, 200MHz would be 10:1 - is that a supported number ?

A Spin-compatible mode, would use older P1 version Spin build, no extra features/improvements added.
That combination should get quite close to same-op-rate as P1

That leaves the CNT inc rate, where the P1 'expects' 80MHz ?

potatohead · 2014-01-09 20:41

That seems like an awful lot of complexity.

We gave up backward compatibility as a design parameter very early on. Similarities are there where they made sense.

Generally speaking, we are used to running speed variable code as P1 has been used with various xtals 80, 96, 100, 104, etc... Mhz. There is a fair amount of 80Mhz code out there, and it's often older code. Newer things, such as video drivers where the WHOP wasn't a factor in the design, generally calculate the parameters needed to operate at a given speed, for just one example. We have many others.

Doing this isn't all that hard, and we've got nice math right on the chip now too.

If this is about running P1 code, some changes are going to make sense. Once that's initiated, it also makes a lot of sense to use the new chip to it's strengths as well.

Tubular · 2014-01-09 20:59

jmg wrote: »

Sounds good, at 160MHz, to target a 20MHz P1 rate, that would be 8:1, 200MHz would be 10:1 - is that a supported number ?

A Spin-compatible mode, would use older P1 version Spin build, no extra features/improvements added.
That combination should get quite close to same-op-rate as P1

That leaves the CNT inc rate, where the P1 'expects' 80MHz ?

10:1 would be supported (variable length task register). However most likely you'd want to run the system PLL multiplier at 4x (20MHz) (ie 80 MHz P1), and then set your task slicing to give you 1/4 of that (20 MIPS).

Cluso99 · 2014-01-09 22:29

Spin2 is never going to be cycle accurate with a P1. The instructions are not the same, etc, etc.
As has been said, we gave up compatibility years ago. Hopefully Spin2 source will be pretty much P1 compatible.

But the P2 is really a vastly superior, and non-compatible monster. Lets keep it that way because there are still lots of uses for P1 where the P2 is not suitable.

jmg · 2014-01-10 00:44

potatohead wrote:

That seems like an awful lot of complexity.
We gave up backward compatibility as a design parameter very early on. Similarities are there where they made sense.

Cluso99 wrote: »

Spin2 is never going to be cycle accurate with a P1. The instructions are not the same, etc, etc.
As has been said, we gave up compatibility years ago.

These comments rather miss the point - Spin2 is not a ROM product, so it is no longer a single-point design.
This is not a one or other problem.

Cluso99 wrote: »

Hopefully Spin2 source will be pretty much P1 compatible.

Now you are getting closer, First objective is to have Spin2 source will be pretty much P1 compatible.

Once you have done that, you do not have to stop there, you can make Spin2 timing optionally be very close to Spin1, that is just SW - it has no impact on the Silicon design path.

Next, it also makes sense to allow the hardware to run Spin1 Constants (esp Waits) unchanged , and for that, a very simple prescaler option is needed on the chip.
Prescalers are common on many uC, as they allow centering your dynamic range, many have much wider than 1 or 2 bits of choice.

Other IC vendors understand the importance of running an existing code base, with minimal customer pain.

It makes little commercial, or technical, sense to sell a 200Mhz part, but then say that all GOGs have to run at 80MHz, to allow 80MHz P1 CONST loading.
Looking at the LP51 example, every timer has a choice of compatible, or fast.

evanh · 2014-01-10 01:51

cgracey wrote: »

The CTRs have modes to sum up 1's in order to realize a 1st-order delta-sigma conversion. I don't know if there will be adequate bandwidth to do any 2nd-order conversions, unless you can use every Nth bit, or MSBs of short accumulations.

With a small bit of circuitry added the two counters in a Cog could be stitched together to form a second order conversion. Based on the AD7400's example I proposed one solution not long ago - http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG/page134 . Drawing - http://forums.parallax.com/attachment.php?attachmentid=104153&d=1380541194

potatohead · 2014-01-10 02:51

Now you are getting closer, First objective is to have Spin2 source will be pretty much P1 compatible.

But that was not the objective at all. Compatibility was put off the table for both SPIN and PASM.

Now it is possible for anyone to load a SPIN and have it do lots of things. Perhaps this is a design path you would consider doing jmg? Seems like a significant project with lots of testing and updating to better run P1 code.

SPIN 2 is going to have some real advantages over SPIN 1 and tying it to SPIN 1 for this purpose is highly likely to both complicate SPIN 2 and limit it, both of which are undesirable.

Interesting. Earlier, when exciting new features were on the table, you would argue this project isn't about being like other vendors, and now it is like other vendors? Which is it jmg?

I would have to wonder about "a little bit of circuitry" given all the timing related things we have interacting now. For single core devices maybe this makes sense, but having variable timings on the COGS, etc... doesn't seem simple.

The chips are different enough to require changes anyway. Might be much easier to produce a filter that translates a program into SPIN 2 and from there somebody would finish the job and take advantage of chip specific features, many of which vary from P1, or simply aren't on P1.

If we do nothing at all to the silicon, there are reasonable software answers to this.

Then there is support. So we get a P2 released and we start doing things with it and suddenly P1 code doesn't work as expected. I sure would not want that expectation on the table at all. It's going to be enough to insure P2 has the docs, tools, etc... required to maximize being a P2.

On the other hand, having a software assist in migrating code to P2 proper may well make a lot of sense and any of us sufficiently motivated to do it, simply could. Doing that may offer some opportunity for somebody here and it would be on no path at all. Optional.

Propeller II update - BLOG

Comments