Big update for DE2-115 and DE0-Nano users w/add-on boards

cgracey · 2013-10-01 01:50

Here's an update for you guys with Terasic boards.

A lot of enhancements have been made to the Verilog:

- Many instructions which used to need polling in multitasking now jump back to themselves without stalling the pipeline, until their wait condition is met. In single-task situations, they still stall the pipeline, just as they used to:

WAITVID/GETMULL/GETMULH/GETDIVQ/GETDIVR/GETSQRT/GETQX/GETQY/GETQZ/SYNCTRA/SYNCTRB

- GETPIX now handles 5:5:5 pixels (two in D), as well as 8:8:8.

- CALLD now figures out the return address based on the number of same-task instructions in the pipeline.

- SETTASK now uses leading %00's to determine the task loop length.

- SARACCA/SARACCB/SARACCS replace the old FITACCA/FITACCB/FITACCS.

- ESWAP4/ESWAP8 (endian swaps) now take the place of the old SEUSSF/SEUSSR.

- There are new immediate modes for QSINCOS.

- REPS/REPD now properly shuts off if its task was affected by a JMPTASK instruction.

- There are two serial subsystems per cog now that operate from 1 clock per bit to 65535 clocks per bit. They do 8N1, both positive and negative polarity, as well as 32N1, negative polarity, for Prop-to-Prop comms. There's also a 4-extra bit ID that can be turned on for automatic message discrimination in the receiver side.

- The following instructions have moved: SETTASK/CLRACCA/CLRACCB/CLRACCS/CACHEX/SETXCH/SETXFR/SETSKIP
- The following instructions have been removed: FITACCA/FITACCB/FITACCS/SEUSSF/SEUSSR
- The following instructions have been added: SERINA/SERINB/ESWAP4/ESWAP5/SARACCA/SARACCB/SARACCS/SEROUTA/SEROUTB/SETPERA/SETSERA/SETPERB/SETSERB

See the Prop2_Docs.txt file and the Prop2_Instructions.txt file for all the details.

Other improvements:

- The ROM monitor now supports ";" comments.

- PNUT uses a built-in 2M baud loader for large apps. Now you can ORGH all programs to $E80 that get downloaded with F11. PNUT figures out if you have a DE0-Nano and only loads the bottom 32KB.

- Spin2 is pretty much done, but not thoroughly tested yet. I need to document it. The cog RAM from $000..$139 is free for (callable or multitasking) PASM. See serialio.spin.

- Both the DE2-115 and DE0-Nano now operate at 80MHz. After loading a large app (F11/F12), the cogs are running at 80MHz. After downloading a loader (F10), they are going 20MHz. See SDRAM_Graphics6 to see CLKSET ($FF) switch the clock up to 80MHz.

Here's the file:

Prop2_Emulation_Boards.zip

Most of these improvements are a result of your involvement in this process. Thanks for all your help, Everyone! I'm thinking the Verilog is stable again now. I must expand the test suite to accommodate the newer things, because in two weeks we are going back to synthesis.

Roy Eltham · 2013-10-01 02:15

Wow, lots of really excellent changes here!

You've been busy!

Gadgetman · 2013-10-01 02:17

He's definitely NOT drinking decaf...

Heater. · 2013-10-01 02:19

Good grief Chip, that is a lot of changes!

All us owners of DE boards had better get testing this stuff pronto. Only two weeks available.

That performance boost might get ospropdev's video glitches sorted out.

evanh · 2013-10-01 02:42

cgracey wrote: »

- Many instructions which used to need polling in multitasking now jump back to themselves without stalling the pipeline, until their wait condition is met. In single-task situations, they still stall the pipeline, just as they used to:

WAITVID/GETMULL/GETMULH/GETDIVQ/GETDIVR/GETSQRT/GETQX/GETQY/GETQZ/SYNCTRA/SYNCTRB

That sounds really cool. What a neat idea. How is the dual abilities done though?

cgracey · 2013-10-01 03:00

evanh wrote: »

That sounds really cool. What a neat idea. How is the dual abilities done though?

When SETTASK executes with a non-0 value, you're in multi-tasking mode. There is a single flipflop that captures the OR of all SETTASK's D/#n bits and if it's "1", then those certain instructions become jumps to themselves until their condition is met. Otherwise, they act as they used to, stalling the pipeline until their condition is met.

ozpropdev · 2013-10-01 03:40

Fantastic work Chip!
You have been busy....
Cheers
Brian

evanh · 2013-10-01 04:20

Ah, there we go:

PASSCNT D          jumps to itself if some amount of time has not passed, use instead of WAITCNT
  JP/JNP  D,S        jumps based on pin states, use instead of WAITPEQ/WAITPNE

I almost asked why some obvious WAITs were missing from the list but decided to have a read instead.

Cluso99 · 2013-10-01 04:24

Brilliant Chip! Wow you have been busy!

Regarding the serial, I was thinking of a simpler more general serial. I have not read your docs yet.

What I was generally after was just a serialiser and deserialiser using flexible clocks as you have described. If they were able to rx/tx a serial stream, we could control this by software (by specifying start/parity/stop bits by software as required). One of the particular uses I had in mind was for USB Full Speed.

So, would it be possible to have a mode that is 32 bits long without start/stop bits inserted by hardware? The tx side would just shift/output a bit each set of 'n' clocks. It just keeps going even in an underrun situation. That way we can load the tx register by software when required. The rx side would act similarly, shift/ input a bit each set of 'n' clocks. Likewise, it just keeps going even in an overrun situation. Software would be responsible to read at the appropriate time. Now we could do NRZ and NRZI too.

Postedit: Ignore the following as there are 2 serialisers, A & B.
USB is actually 2 pins (both inputs or both outputs). They are normally inverted, and is a special case when the same polarity. This would require external gating.
Would it be possible to tx on 2 pins, and have control for inverting or not inverting the second output pin?

David Betz · 2013-10-01 05:08

This all sounds very cool! I am embarassed to admit though that it's been so long since I've loaded a new FPGA configuration into either of my boards that I've forgotten the details. Where can I find instructions on programming the new FPGA configurations into the DE0-Nano and DE2-115? I looked through the files in Chip's post and didn't find installation instructions. I did look at the sticky message on the P2 but didn't see a link to the instructions there either. Can someone point me in the right direction? Thanks!!

Edit: Never mind. I figured it out. Sorry for the distraction!

Baggers · 2013-10-01 05:20

Awesome work Chip! I can't wait to give this a try later

ctwardell · 2013-10-01 06:13

Chip,

Those are some really nice changes.

I think we need to call you "The Wizard of Menlo Drive". :-)

Chris Wardell

Rayman · 2013-10-01 06:15

Are there any DE2 boards left?

David Betz · 2013-10-01 06:17

Rayman wrote: »

Are there any DE2 boards left?

You mean DE2 add-on boards? I think Parallax may have a few of those left.

David Betz · 2013-10-01 06:24

I don't think I've ever tried my DE0-Nano add-on board. I noticed that the connector for the power supply says 7.5v. Can a 9v supply be used or is that too high? I don't have any 7.5v supplies.

Rayman · 2013-10-01 07:45

Yes, I meant the Parallax add-on board for the DE2. They were giving them out, but not sure if any are left...

The DE0 can be powered by the USB port...

David Betz · 2013-10-01 08:12

Rayman wrote: »

Yes, I meant the Parallax add-on board for the DE2. They were giving them out, but not sure if any are left...

The DE0 can be powered by the USB port...

The DE0-Nano board can be powered by USB even when plugged into the add-on board? Will all of the add-on board interfaces like video still work?

The last I heard Parallax still had a few DE2-115 add-on boards available.

Edit: I assume you still need a PropPlug though, right? The USB interface only supplies power. You can't use it to talk to the P2 serial pins.

Leon · 2013-10-01 08:12

The manual states 3.6 V to 5.7 V.

potatohead · 2013-10-01 08:27

Sweet!

Time to play tonight! Awesome Chip. Good grief, how many hours were you up?

David Betz · 2013-10-01 08:29

cgracey wrote: »

- There are two serial subsystems per cog now that operate from 1 clock per bit to 65535 clocks per bit. They do 8N1, both positive and negative polarity, as well as 32N1, negative polarity, for Prop-to-Prop comms. There's also a 4-extra bit ID that can be turned on for automatic message discrimination in the receiver side.

This is quite nice and will work well for prop-to-prop communications but is it possible to make it flexible enough to support fast SPI? That will let the P2 talk to other non-Propeller chips. I remember when Andre' LaMothe did his Chameleon boards one of the bottlenecks was the slow SPI performance on the Propeller. Can the new serial subsystem support SPI?

jazzed · 2013-10-01 08:42

Hi Chip,

It's a bit early for me, so pardon me if these are stupid questions.

Is it possible to arrange for a clock that would toggle during the SERA/B receive and transmit streams only during data times (not in the start/stop bit)? This could support a fast SPI implementation.

Is it something we could do with a counter and waitvid? Maybe it could just be added to the SERA/B configuration? Setting the start/stop bit as optional might be just as effective.

Thanks for SERA/B !

--Steve

Rayman · 2013-10-01 08:42

I just used 2 USB cables... One for power and one with propplug...

Ariba · 2013-10-01 08:59

Thank you Chip

great work
Now I know why you were so quiet for a week ..

Andy

Ken Gracey · 2013-10-01 08:59

Hey all,

I've encouraged Chip to post new Propeller binaries in this new thread, and to do the same for future releases.

The benefit of the "Propeller 2 Blog" is for updates on our planning, process and features, but I've run into too many Propeller 2 developers who've wandered into old, long threads and simply emerge confused because of dated information. The use of new thread distinguishes new key releases, particularly for Propeller 2 binaries. If we can keep this mode of operation in mind I think we might be a bit more organized.

Chip, excellent work on the release. The Rocklin troops are jumping up and down!

Ken Gracey

pedward · 2013-10-01 09:19

The UARTs are nothing short of an amazing change in the P2. I talked to Chip and it was obvious why: He replaced the old "prop to prop" communications with these per-COG UARTs because the old chip2chip stuff created clocking restrictions in the synthesized core. There was a clock tree that had to make its way around the chip just for that feature, which did a lot to mess with the synthesis. As a rule, any service that is part of a COG individually can be synthesized much easier than something which is a "chip" feature. Ironically, the cheaper way to go is multiply all features by 8! So pretty much every COG has all these really cool bits because it would be a complete mess trying to centralize them and marshal access to them and keep the clock restrictions happy.

BTW, I already bent his ear about adding a synchronous clock for the UARTs and having the ability to turn off the start and stop bits, so basically you can then use it as a serial shift register with an automatic clock generation.

Phil Pilgrim (PhiPi) · 2013-10-01 09:24

Thanks, Chip! I'm very much encouraged by the new handling of waits during multitasking and that fact that you distinguish between waits on terminal vs. transient conditions (which might get missed if they don't stall the pipeline).

The serial stuff, though, has me scratching my head a bit. As are others, I'm curious why you chose to implement a UART rather than a more general serializer/deserializer. It seems to buck the philosophy of not including specialized hardware peripherals.

-Phil

Seairth · 2013-10-01 09:43

pedward wrote: »

BTW, I already bent his ear about adding a synchronous clock for the UARTs and having the ability to turn off the start and stop bits, so basically you can then use it as a serial shift register with an automatic clock generation.

How receptive was he? This would be great!

Bill Henning · 2013-10-01 09:52

Sapieha sent me the link... and I am just digesting the new post, docs, etc, but I'd like to add a "me too" for SPI support to the serial feature.

First of all - FANTASTIC changes Chip!

I have not gotten to the serial port part of the documentation, so take the following with a grain of salt...

Ideally,

TX should have the ability to output the bit clock to a pin (this is implied to exist by the 1 bit per clock mode)
RX should have the ability to shift on an input pin used as a clock (this is implied to exist by the 1 bit per clock mode)
polarity of clock should be settable (ie clock on rising/falling edge)

I like pedward's suggetion ofoptionally supressing the clock output for SPI

For USB, differential input and output support would be GREAT

Bill Henning · 2013-10-01 09:56

I just got to the stack ram section in the new zip, and (for Prop 3) the following two additions would be great:

RDSTACK cog_dst, stack_src/#stack_addr

WRSTACK cog_src, stack_dst/#stack_addr

Currently it takes two instructions to use a stack location as a temp var (unless you push/pull) this would add fast random access spare registers, so even if these had to take two cycles, it would save valuable cog ram

cgracey · 2013-10-01 09:56

Phil Pilgrim (PhiPi) wrote: »

Thanks, Chip! I'm very much encouraged by the new handling of waits during multitasking and that fact that you distinguish between waits on terminal vs. transient conditions (which might get missed if they don't stall the pipeline).

The serial stuff, though, has me scratching my head a bit. As are others, I'm curious why you chose to implement a UART rather than a more general serializer/deserializer. It seems to buck the philosophy of not including specialized hardware peripherals.

-Phil

Well, I'm always needing UARTs. The SPI stuff can always be bit-banged without precise timing concerns, unlike asynchronous serial. So, async demands a hardware solution. The concept of 1-wire async serial, where you have a rest state, a notifying start state, data bits, and a return to a rest state, is something pure and simple - universally useful and irreducible. All both parties need to know is rate and number of bits.

I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.

Bill Henning · 2013-10-01 10:05

Basically four targets:

1) SPI at speeds greater than clkfreq/4 (the maximum bitbanging, which also ties up the whole cog; clkfreq/2 with counters is possible, but not easy and still ties up the clock) Fast SPI parts are becoming very prevelant, 104MHz flash is easy to get, maybe faster (have not searched)

With hardware support, a cog need not be tied up, and would be 2x-4x faster as well.

2) USB (needs differential in/out)

Lot simpler software,could run as a task instead of whole cog

3) with differential i/o, 10Mbps ethernet as a task (manchestr encoding in hardware may be too much to ask for)

4) high speed RS485/422 as a task

cgracey wrote: »

Well, I'm always needing UARTs. The SPI stuff can always be bit-banged without precise timing concerns, unlike asynchronous serial. So, async demands a hardware solution. The concept of 1-wire async serial, where you have a rest state, a notifying start state, data bits, and a return to a rest state, is something pure and simple - universally useful and irreducible. All both parties need to know is rate and number of bits.

I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.

Big update for DE2-115 and DE0-Nano users w/add-on boards

Comments