Propeller II update - BLOG

Bill Henning · 2014-01-25 20:33

I plan on using my nano's as:

- "baby ducks" following "mama duck" (de2-115) ... prop network connected by the fast UART
- hardware add-on development (less painful if I accidentally fry a nano)
- driver development (more dev platforms)

My de2 is more for "system" development and hubexec playground

- I have to re-write a couple of VM's for the new instructions and encoding
- I have to re-write my compilers back end

Sapieha · 2014-01-25 20:49

Hi.

For use more that 97% of FPGA sapce evtire system need be compiled as Logic Regions.

And some times that need many compile runs to place and optimize all regions

Cluso99 · 2014-01-25 21:02

I agree with Chip that there is plenty of programming/testing that can be done with the DE0. We certainly have plenty to do with testing the hubexec mode. My P2 Debugger will be greatly simplified and size reduced with hubexec mode instead of LMM mode, and of course speed will increase at least ~8x, if not ~16x.

I am sure Chip will be able to be persuaded to do a couple of implementations for us. I expect Brian (ozpropdev) will do some amazing things with the new features.

Currently I see no point in adding the B??? fpga to the mix for a limited 12% gain in size. We would be better off with the larger 2-3 cog $179 terasic board, or the Parallax fpga board.

This new release is certainly going to give a huge boost to the possibilities of the P2. This last 2 1/2 months has seen an amazing improvement to the P2.

Hopefully Chip may release pnut source so we can get this running (hosting) on the P2

I know Chip would love to do this himself, but I fear there will be too many other things for him to do.

jmg · 2014-01-25 21:33

Cluso99 wrote: »

Currently I see no point in adding the B??? fpga to the mix for a limited 12% gain in size. We would be better off with the larger 2-3 cog $179 terasic board, or the Parallax fpga board.

? These are not mutually exclusive - there is no reason not to support both Cyclone V boards.
( The $35.97 BeMicro CV and the new $199/$150e Altera DE1-SoC Board, with SDRAM )

When we are talking about 97% full parts, that 12% more is significant gain. Expect a speed gain as well.
Add in the lower price, in stock, and what is not to like ?

Roy Eltham · 2014-01-25 21:44

jmg,
You want to spend all of Chip's time building for different chips/boards? Seriously, it takes a long time to do each build.

All you people talking about every different FPGA board coming out, do you want Chip spending time building different configurations for all these new FPGA boards and doing all those different builds?

Maybe after everything is done and we are just waiting for fabbing, but not before.

potatohead · 2014-01-25 21:59

That was going to be my post exactly Roy.

ozpropdev · 2014-01-25 22:00

The ideal scenario is Parallax producing their own FPGA board ASAP.

Then we all have the same functionality and compatibility. (Cogs, Ram, Peripherals and Connectors)

Chip only has to compile a single version and no one misses out on running posted code/demos.

Brian

jmg · 2014-01-25 22:25

Roy Eltham wrote: »

jmg,
You want to spend all of Chip's time building for different chips/boards? Seriously, it takes a long time to do each build.

All you people talking about every different FPGA board coming out, do you want Chip spending time building different configurations for all these new FPGA boards and doing all those different builds?

Maybe after everything is done and we are just waiting for fabbing, but not before.

Nope, read what I said earlier.

Clearly, first they release on the Boards they have now, but they are going to have to build for Cyclone V, and certainly need to do so before they finish their own board.
Then, in order to get maximum test coverage (you do want test coverage?) it is important to reach as many hands as possible.
The BEmicro board is a clear candidate there.

With the design frozen, Chip does not need full time active involvement in variant builds - hand it to an intern.

If you want to talk about Chip's time, I've said before the SerDes & Timer Docs should be next, but variant builds can be done largely in parallel with that.

jmg · 2014-01-25 22:26

ozpropdev wrote: »

The ideal scenario is Parallax producing their own FPGA board ASAP.

Then we all have the same functionality and compatibility. (Cogs, Ram, Peripherals and Connectors)

Chip only has to compile a single version and no one misses out on running posted code/demos.

It all comes down to price, and availability. FPGAs vary a lot in price, and that dominates PCB prices.

cgracey · 2014-01-25 22:37

These inexpensive Cyclone V boards are intriguing because they sometimes cost less than the actual FPGA. They often represent partnerships among several chip manufacturers who are really trying to advertise their chips. It's very tempting to take advantage of those, because all we have to do is build a simple break-out board.

If we make our own board based on the Cyclone V -A7, it's going to sell for around $400.

The big pain about supporting multiple boards is the initial setup, not so much the maintenance.

potatohead · 2014-01-25 22:45

Are you able to do that with conditional builds?

Yes, they [new FPGA boards] are tempting for that reason. The DE2 can be had for academic prices of about $200. I would go for a $400 board, if I knew it was for the longer term. P2 + type development as proposed. Learning about that is something I would very much like to do. On that basis, the board is a steal.

For just testing and general ramp up to real P2 chips, that price might not be so attractive. I don't know.

Heater. · 2014-01-26 00:31

jmg,

With the design frozen,

Yeah, right

Seems to me that having many versions of he FPGA configuration will not only waste a lot of time compiling them but also lead to a lot of confusion and support issues.

rjo__ · 2014-01-26 04:06

I am absolutely delighted that my little Nano will be supported for the next iteration. At the same time, I want as close to a full P2 as I can get. The only reason I haven't ordered a DE2-115 is because the add on boards are not yet available and the schedule for Cyclone V development is still up in the air.

My understanding is that once the current iteration is made bug free, the design will be set in stone and then ported to a Cyclone V board, where further development will occur. I personally don't care about the price, but if you consider that there will be an installed user base for some of the really cheap boards out there, I would consider going with one of the more popular boards, because the users of those boards will be able to migrate to Parallax for no additional cost. On the other hand, Parallax could support two Cyclone V boards a generic and Parallax's V7, by simply offering two different configurations and one additional break-out board, which could be free to teachers and discounted to students.

Shawna · 2014-02-01 11:00

Hey guys,
I have been following this thread for sometime now. Most of the time I really have no clue what you guys are talking about. I was wondering when the next shuttle run was scheduled, and if it is a success what does that mean for a time frame for the P2 getting out to the general public?

From reading through the threads it doesn't look like spin 1 will be directly compatible with spin 2, it sounds like it will have to be ported, am I understanding that correctly?

How much faster will spin2 be than spin1?

Is there a thread or post that outlines the current specs for the p2, I have found different sites with specs but they seem to be conflicting each other?

Thanks

Bob Lawrence (VE1RLL) · 2014-02-01 11:11

re: Prop2

http://www.parallaxsemiconductor.com/Products/propeller2specs

http://www.parallaxsemiconductor.com/sites/default/files/parallax/Propeller2DetailedPreliminaryFeatureList-v2.0.pdf

evanh · 2014-02-01 14:51

I think that is a bit out of date. It even pre-dates the first clock-rate refactor from 200MHz to 160MHz. It's back to 200MHz now.

Shawna: We're talking many months, at least six I 'd guess, for saleable product. And that's assuming a perfect run first try. I wouldn't bet on before the end of the year. Full specs, along with lots of doc, will likely turn up during the first test shuttle.

Normally, all this grunt-work time is not on display to the public. In the mean time here's a snippet from the first bit of current Prop2_Docs.txt file:

PROPELLER 2 MEMORY
------------------

In the Propeller 2, there are two primary types of memory:

HUB MEMORY

    256K bytes of 1-port main memory shared by all cogs

        - cogs launch from this memory
        - cogs can read/write this memory as bytes, words, longs, and wides (8 longs)
        - $00000..$00DFF is ROM - contains Booter, SHA-256/HMAC, and Monitor
        - $00E00..$3FFFF is RAM - for application usage


COG MEMORY (8 instances)

    512 longs of 4-port register RAM for code and data usage

        - simultaneous instruction, source, and destination reading, plus destination writing
        - %000..$1F1 = RAM
        - $1F2       = INDA, indirect window
        - $1F3       = INDB, indirect window
        - $1F4..$1F7 = PINA..PIND, pin input (read-only)
        - $1F8..$1FB = OUTA..OUTD, pin output state control
        - $1FC..$1FF = DIRA..DIRD, pin output drive control

    256 longs of 2-port auxiliary RAM for data and video usage

        - readable and writeable via instructions or free-running pin-transfer circuit
        - video circuit can read pixel data asynchronously from second port

    4 longs of stack for CALL/RET/PUSH/POP instructions

    8 longs x 1 line of data cache for RDBYTEC/RDWORDC/RDLONGC/RDWIDEC instructions

    8 longs x 4 lines of instruction cache for executing from hub memory

MJB · 2014-02-06 04:27

is this still on the list for review after SERDES?
or has it been put off already?
This would go with the counters - in synthesized code - and not in the manual pin code if I get it right?
would this even be beneficial in decoding the bitstreams from the P2 internal ADCs
or would this only make sense with a higher order encoder as the front end?
Here my understanding of delta-sigma is still not complete.

Cluso99 wrote: »

evanh: I have included your chained counter proposal here. I think this is a simple and worthwhile addition to the P2 if Chip has the time (now or later).
I will try and remember it when we go thru the list later.

Attachment not found.
I thought it would be nice...
Source the Clock from an external pin.

Source FRQx from an external pin (ie mux FRQx output with an external pin)

With my mods (in red), and using the internal pins, it would be possible to chain CTRA APin to CTRB BPin to achieve your result above.

Attachment not found.

Postedit: I see Chip has commented while I was posting.

evanh · 2014-02-06 05:48

MJB wrote: »

is this still on the list for review after SERDES?
or has it been put off already?

I'm planning on bumping the matter when/if the SERDES hardware gets worked on. I know we'd be rebuffed pretty quick if we tried pushing in ahead of that. Chip hasn't yet said no.

This would go with the counters - in synthesized code - and not in the manual pin code if I get it right?

That's what I was targeting. There is something fancy in the pin circuits to preserve the DACs but I don't think it's the same for ADC pin circuits. The cog counters are it.

would this even be beneficial in decoding the bitstreams from the P2 internal ADCs
or would this only make sense with a higher order encoder as the front end?
Here my understanding of delta-sigma is still not complete.

I'm pretty certain that a multi-order demodulator helps irrespective of what variant of modulator is producing the bit-stream.

evanh · 2014-02-07 04:23

For anyone wanting a clear intro to delta-sigma ADCs I've just bumped into the following from TI - www.ti.com/lit/an/slyt423/slyt423.pdf

Happy reading. It's certainly cleared up some of my confusion over symbology. Probably the biggest clarification I got was over the 1-bit DAC in the feedback. It is digital in and out. Figure 3 even shows it as signal X4 on both sides.

There is a second part for the demodulation but it only covers the principle of filtering and doesn't directly address the integrating method we're interested in here.

MJB · 2014-02-07 09:30

for those who like to listen instead of reading ... here the video explanation of sigma delta ADCs
http://www.ti.com/lit/zip/sbac061

evanh wrote: »

For anyone wanting a clear intro to delta-sigma ADCs I've just bumped into the following from TI - www.ti.com/lit/an/slyt423/slyt423.pdf

Happy reading. It's certainly cleared up some of my confusion over symbology. Probably the biggest clarification I got was over the 1-bit DAC in the feedback. It is digital in and out. Figure 3 even shows it as signal X4 on both sides.

There is a second part for the demodulation but it only covers the principle of filtering and doesn't directly address the integrating method we're interested in here.

MJB · 2014-02-09 09:54

meanwhile, after some more reading, I think I understand a little more about the delta-sigma-ADCs.

- the biggest influence in SNR (signal to noise ratio) has the order of the modulator.

The modulator is essentially an analog module. First order modulator.

Second order modulator:

As far as I understand the P2 only has first order modulators at each pin.
Higher order modulators can only play out their strength with higher order decimation filters,
which we do not have in P2 either.
But a higher order filter will give some benefit even with a first order modulator.

evanh · 2014-02-10 02:14

Yep, sounds right. Second order filtering in the counters is beneficial to the Prop2's internal ADCs.

Also, apologies for my use of the term demodulator. As much as I thought it was appropriate it would seem it's not a correct name for the filter/decimator combo. Re-sampler maybe?

Ariba · 2014-02-10 16:42

MJB wrote: »

meanwhile, after some more reading, I think I understand a little more about the delta-sigma-ADCs.

- the biggest influence in SNR (signal to noise ratio) has the order of the modulator.

The modulator is essentially an analog module. First order modulator.

Second order modulator:

As far as I understand the P2 only has first order modulators at each pin.
Higher order modulators can only play out their strength with higher order decimation filters,
which we do not have in P2 either.
But a higher order filter will give some benefit even with a first order modulator.

The Prop2 has a very high sampling frequency for the SigmaDelta ADC. Compared with a "normal" sample frequency of around 20 MHz, the 160..200MHz of the Prop2 reduces the noise more than a higher order modulator on lower frequencies..

As I said before you can filter the PHSx value with a IIR filter to get higher order demodulators and lower noise.
Something like that:

.---------+-----------.
           |         |           |
           |-   cf   |           |
  phsa ---(+)---|>--(+)--[adval]-+--

   getphsa tmp     'decimation and first order from CNTR hardware
   sub tmp,adval   'second order with IIR filter
   scl tmp,cf      'cf = cutoff freq. constant
   add adval,tmp   'result in adval
'

I think there is really no need to add anything in hardware.

Andy

evanh · 2014-02-11 00:56

Fair enough, Bonnie did highlight the second part as just filtering and decimating. I guess the higher orders filtering can be post-decimation, I'm not knowledgeable enough to say otherwise.

The double integration per clock did seem like a cool bit of hardware though.

Seairth · 2014-02-12 19:32

Ariba wrote: »

The Prop2 has a very high sampling frequency for the SigmaDelta ADC. Compared with a "normal" sample frequency of around 20 MHz, the 160..200MHz of the Prop2 reduces the noise more than a higher order modulator on lower frequencies..

Is that true? I thought most other chips sample with 8-bit (or greater) resolution at those frequencies. At 200MHz, the P2 must perform 256 samples (at one sample per clock) for 8-bit resolution, so can only get about 781ksps. Or, going the other way, you can only get 3-4 bit resolution at 20msps. Wouldn't 8-bit samples have less noise than 4-bit samples?

Ariba · 2014-02-12 20:27

I was talking about SigmaDelta ADCs which always sample with one bit. My comparsion was between a SigmaDelta ADC clocked with 20 MHz with a higher order modulator and 200 MHz with a first order modulator.

Yes, there are other ADC types, like Flash-ADCs that can sample 8 or more bits at once at sample rates in the MHz (even 200 MHz or more). It would be nice to have one or two such ADCs in the Prop2 but they need a lot of comparators and a priority encoder.

Maybe we can do 9 bit ADCs with succesive aproximation with the fast DACs and the fast comparator per pin. But I think that needs an external S&H citcuit. Depending on the speed of the comparator this can result in higher sample rates than SigmaDelta for 9 bit resolution.

Andy

Seairth · 2014-02-12 20:42

Ariba wrote: »

I was talking about SigmaDelta ADCs which always sample with one bit. My comparsion was between a SigmaDelta ADC clocked with 20 MHz with a higher order modulator and 200 MHz with a first order modulator.

Ahh. Yes. That makes much more sense.

cgracey · 2014-02-13 19:07

One thing that's bugged me about the WIDEs is that they cannot do sustained reads and writes to and from the hub. If you do nine RDLONGC's in a row, you'll exhaust the data cache at some point and need another RDWIDE, but only after the hub window has just passed, forcing you to wait for the next one. This cuts the effective data rate in half. Some speculative prefetching could be done, but it's kind of messy and doesn't address the write issue. XFR gets around all this by reading or writing the WIDEs on every clock, but then you need to issue the RDWIDE/WRWIDEs to transact with the hub, which is fine for SDRAM. Getting data between cog and hub RAM is more complicated because of pipeline requirements - each read or write to cog RAM requires an instruction.

Since last night, I've been trying to come up with some way to achieve sustained transfers between the hub and cog RAM and the hub and AUX.

Here's how it will work:

New sustained-RDWIDE/WRWIDE instructions
----------------------------------------

RDWIDEA D/#	'read D/# wides using background RDWIDE PTRA++, read conduit is $1F1, 4..11 initial clocks
RDWIDEB D/#	'read D/# wides using background RDWIDE PTRB++, read conduit is $1F1, 4..11 initial clocks

WRWIDEA D/#	'write D/# wides using background WRWIDE PTRA++, write conduit is $1F1, 1..8 initial clocks
WRWIDEB D/#	'write D/# wides using background WRWIDE PTRB++, write conduit is $1F1, 1..8 initial clocks


' Transfer from hub to cog

	SETPTRA	<hubaddr>
	SETINDA	#<cogaddr>
	REPS	#<wides*8>,#1
	RDWIDEA	#<wides>
	MOV	INDA++,$1F1


' Transfer from hub to aux

	SETPTRA	<hubaddr>
	SETPTRX	<auxaddr>
	REPS	#<wides*8>,#1
	RDWIDEA	#<wides>
	WRAUX	$1F1,PTRX++


' Transfer from cog to hub

	SETPTRA	<hubaddr>
	SETINDA	#<cogaddr>
	REPS	#<wides*8>,#1
	WRWIDEA	#<wides>
	MOV	$1F1,INDA++


' Transfer from aux to hub

	SETPTRA	<hubaddr>
	SETPTRX	#<auxaddr>
	REPS	#<wides*8>,#1
	WRWIDEA	#<wides>
	RDAUX	$1F1,PTRX++

This will cut the cog loading time in half, too. Now we can breathe data between all memories at the rate of one long per clock.

You can do other things, too, like for WRWIDEA, instead of 'MOV $1F1,INDA++', you could do 'MOV $1F1,INA' to capture pins. You just need an instruction to write $1F1. RDWIDEA just needs an instruction to read $1F1, like 'MOV OUTA,$1F1' or 'SETDACS $1F1' to write four 8-bit DAC values per clock!

Bill Henning · 2014-02-13 19:21

Chip,

That is VERY cool! Great for video, and a lot more...

Cluso99 · 2014-02-13 19:25

WOW - This is fantastic Chip!

I am curious about the difference between the MOV and the WRAUX/RDAUX although I know you could not use the same MOV instruction.

Does this mean $1F1 gets clobbered, or only anytime we reference $1F1 in a MOV or RD/WRAUX instruction ?

Pity the COG RAM wasn't built in a x128 (ie blocks of 8 longs)

Isn't hindsight wonderful.

Propeller II update - BLOG

Comments