Big update for DE2-115 and DE0-Nano users w/add-on boards
cgracey
Posts: 14,151
Here's an update for you guys with Terasic boards.
A lot of enhancements have been made to the Verilog:
- Many instructions which used to need polling in multitasking now jump back to themselves without stalling the pipeline, until their wait condition is met. In single-task situations, they still stall the pipeline, just as they used to:
WAITVID/GETMULL/GETMULH/GETDIVQ/GETDIVR/GETSQRT/GETQX/GETQY/GETQZ/SYNCTRA/SYNCTRB
- GETPIX now handles 5:5:5 pixels (two in D), as well as 8:8:8.
- CALLD now figures out the return address based on the number of same-task instructions in the pipeline.
- SETTASK now uses leading %00's to determine the task loop length.
- SARACCA/SARACCB/SARACCS replace the old FITACCA/FITACCB/FITACCS.
- ESWAP4/ESWAP8 (endian swaps) now take the place of the old SEUSSF/SEUSSR.
- There are new immediate modes for QSINCOS.
- REPS/REPD now properly shuts off if its task was affected by a JMPTASK instruction.
- There are two serial subsystems per cog now that operate from 1 clock per bit to 65535 clocks per bit. They do 8N1, both positive and negative polarity, as well as 32N1, negative polarity, for Prop-to-Prop comms. There's also a 4-extra bit ID that can be turned on for automatic message discrimination in the receiver side.
- The following instructions have moved: SETTASK/CLRACCA/CLRACCB/CLRACCS/CACHEX/SETXCH/SETXFR/SETSKIP
- The following instructions have been removed: FITACCA/FITACCB/FITACCS/SEUSSF/SEUSSR
- The following instructions have been added: SERINA/SERINB/ESWAP4/ESWAP5/SARACCA/SARACCB/SARACCS/SEROUTA/SEROUTB/SETPERA/SETSERA/SETPERB/SETSERB
See the Prop2_Docs.txt file and the Prop2_Instructions.txt file for all the details.
Other improvements:
- The ROM monitor now supports ";" comments.
- PNUT uses a built-in 2M baud loader for large apps. Now you can ORGH all programs to $E80 that get downloaded with F11. PNUT figures out if you have a DE0-Nano and only loads the bottom 32KB.
- Spin2 is pretty much done, but not thoroughly tested yet. I need to document it. The cog RAM from $000..$139 is free for (callable or multitasking) PASM. See serialio.spin.
- Both the DE2-115 and DE0-Nano now operate at 80MHz. After loading a large app (F11/F12), the cogs are running at 80MHz. After downloading a loader (F10), they are going 20MHz. See SDRAM_Graphics6 to see CLKSET ($FF) switch the clock up to 80MHz.
Here's the file:
Prop2_Emulation_Boards.zip
Most of these improvements are a result of your involvement in this process. Thanks for all your help, Everyone! I'm thinking the Verilog is stable again now. I must expand the test suite to accommodate the newer things, because in two weeks we are going back to synthesis.
A lot of enhancements have been made to the Verilog:
- Many instructions which used to need polling in multitasking now jump back to themselves without stalling the pipeline, until their wait condition is met. In single-task situations, they still stall the pipeline, just as they used to:
WAITVID/GETMULL/GETMULH/GETDIVQ/GETDIVR/GETSQRT/GETQX/GETQY/GETQZ/SYNCTRA/SYNCTRB
- GETPIX now handles 5:5:5 pixels (two in D), as well as 8:8:8.
- CALLD now figures out the return address based on the number of same-task instructions in the pipeline.
- SETTASK now uses leading %00's to determine the task loop length.
- SARACCA/SARACCB/SARACCS replace the old FITACCA/FITACCB/FITACCS.
- ESWAP4/ESWAP8 (endian swaps) now take the place of the old SEUSSF/SEUSSR.
- There are new immediate modes for QSINCOS.
- REPS/REPD now properly shuts off if its task was affected by a JMPTASK instruction.
- There are two serial subsystems per cog now that operate from 1 clock per bit to 65535 clocks per bit. They do 8N1, both positive and negative polarity, as well as 32N1, negative polarity, for Prop-to-Prop comms. There's also a 4-extra bit ID that can be turned on for automatic message discrimination in the receiver side.
- The following instructions have moved: SETTASK/CLRACCA/CLRACCB/CLRACCS/CACHEX/SETXCH/SETXFR/SETSKIP
- The following instructions have been removed: FITACCA/FITACCB/FITACCS/SEUSSF/SEUSSR
- The following instructions have been added: SERINA/SERINB/ESWAP4/ESWAP5/SARACCA/SARACCB/SARACCS/SEROUTA/SEROUTB/SETPERA/SETSERA/SETPERB/SETSERB
See the Prop2_Docs.txt file and the Prop2_Instructions.txt file for all the details.
Other improvements:
- The ROM monitor now supports ";" comments.
- PNUT uses a built-in 2M baud loader for large apps. Now you can ORGH all programs to $E80 that get downloaded with F11. PNUT figures out if you have a DE0-Nano and only loads the bottom 32KB.
- Spin2 is pretty much done, but not thoroughly tested yet. I need to document it. The cog RAM from $000..$139 is free for (callable or multitasking) PASM. See serialio.spin.
- Both the DE2-115 and DE0-Nano now operate at 80MHz. After loading a large app (F11/F12), the cogs are running at 80MHz. After downloading a loader (F10), they are going 20MHz. See SDRAM_Graphics6 to see CLKSET ($FF) switch the clock up to 80MHz.
Here's the file:
Prop2_Emulation_Boards.zip
Most of these improvements are a result of your involvement in this process. Thanks for all your help, Everyone! I'm thinking the Verilog is stable again now. I must expand the test suite to accommodate the newer things, because in two weeks we are going back to synthesis.
Comments
You've been busy!
All us owners of DE boards had better get testing this stuff pronto. Only two weeks available.
That performance boost might get ospropdev's video glitches sorted out.
That sounds really cool. What a neat idea. How is the dual abilities done though?
When SETTASK executes with a non-0 value, you're in multi-tasking mode. There is a single flipflop that captures the OR of all SETTASK's D/#n bits and if it's "1", then those certain instructions become jumps to themselves until their condition is met. Otherwise, they act as they used to, stalling the pipeline until their condition is met.
You have been busy....
Cheers
Brian
Regarding the serial, I was thinking of a simpler more general serial. I have not read your docs yet.
What I was generally after was just a serialiser and deserialiser using flexible clocks as you have described. If they were able to rx/tx a serial stream, we could control this by software (by specifying start/parity/stop bits by software as required). One of the particular uses I had in mind was for USB Full Speed.
So, would it be possible to have a mode that is 32 bits long without start/stop bits inserted by hardware? The tx side would just shift/output a bit each set of 'n' clocks. It just keeps going even in an underrun situation. That way we can load the tx register by software when required. The rx side would act similarly, shift/ input a bit each set of 'n' clocks. Likewise, it just keeps going even in an overrun situation. Software would be responsible to read at the appropriate time. Now we could do NRZ and NRZI too.
Postedit: Ignore the following as there are 2 serialisers, A & B.
USB is actually 2 pins (both inputs or both outputs). They are normally inverted, and is a special case when the same polarity. This would require external gating.
Would it be possible to tx on 2 pins, and have control for inverting or not inverting the second output pin?
Edit: Never mind. I figured it out. Sorry for the distraction!
Those are some really nice changes.
I think we need to call you "The Wizard of Menlo Drive". :-)
Chris Wardell
The DE0 can be powered by the USB port...
The last I heard Parallax still had a few DE2-115 add-on boards available.
Edit: I assume you still need a PropPlug though, right? The USB interface only supplies power. You can't use it to talk to the P2 serial pins.
Time to play tonight! Awesome Chip. Good grief, how many hours were you up?
It's a bit early for me, so pardon me if these are stupid questions.
Is it possible to arrange for a clock that would toggle during the SERA/B receive and transmit streams only during data times (not in the start/stop bit)? This could support a fast SPI implementation.
Is it something we could do with a counter and waitvid? Maybe it could just be added to the SERA/B configuration? Setting the start/stop bit as optional might be just as effective.
Thanks for SERA/B !
--Steve
great work
Now I know why you were so quiet for a week ..
Andy
I've encouraged Chip to post new Propeller binaries in this new thread, and to do the same for future releases.
The benefit of the "Propeller 2 Blog" is for updates on our planning, process and features, but I've run into too many Propeller 2 developers who've wandered into old, long threads and simply emerge confused because of dated information. The use of new thread distinguishes new key releases, particularly for Propeller 2 binaries. If we can keep this mode of operation in mind I think we might be a bit more organized.
Chip, excellent work on the release. The Rocklin troops are jumping up and down!
Ken Gracey
BTW, I already bent his ear about adding a synchronous clock for the UARTs and having the ability to turn off the start and stop bits, so basically you can then use it as a serial shift register with an automatic clock generation.
The serial stuff, though, has me scratching my head a bit. As are others, I'm curious why you chose to implement a UART rather than a more general serializer/deserializer. It seems to buck the philosophy of not including specialized hardware peripherals.
-Phil
How receptive was he? This would be great!
First of all - FANTASTIC changes Chip!
I have not gotten to the serial port part of the documentation, so take the following with a grain of salt...
Ideally,
TX should have the ability to output the bit clock to a pin (this is implied to exist by the 1 bit per clock mode)
RX should have the ability to shift on an input pin used as a clock (this is implied to exist by the 1 bit per clock mode)
polarity of clock should be settable (ie clock on rising/falling edge)
I like pedward's suggetion ofoptionally supressing the clock output for SPI
For USB, differential input and output support would be GREAT
RDSTACK cog_dst, stack_src/#stack_addr
WRSTACK cog_src, stack_dst/#stack_addr
Currently it takes two instructions to use a stack location as a temp var (unless you push/pull) this would add fast random access spare registers, so even if these had to take two cycles, it would save valuable cog ram
Well, I'm always needing UARTs. The SPI stuff can always be bit-banged without precise timing concerns, unlike asynchronous serial. So, async demands a hardware solution. The concept of 1-wire async serial, where you have a rest state, a notifying start state, data bits, and a return to a rest state, is something pure and simple - universally useful and irreducible. All both parties need to know is rate and number of bits.
I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.
1) SPI at speeds greater than clkfreq/4 (the maximum bitbanging, which also ties up the whole cog; clkfreq/2 with counters is possible, but not easy and still ties up the clock) Fast SPI parts are becoming very prevelant, 104MHz flash is easy to get, maybe faster (have not searched)
With hardware support, a cog need not be tied up, and would be 2x-4x faster as well.
2) USB (needs differential in/out)
Lot simpler software,could run as a task instead of whole cog
3) with differential i/o, 10Mbps ethernet as a task (manchestr encoding in hardware may be too much to ask for)
4) high speed RS485/422 as a task