I was looking today at the Altera Cyclone V devices. The biggest one is the A9 and it has 301k LE's for $220.00 from Digi-Key. We could build a board for that that would be good for 8 cogs and Prop III development, too. We could recreate the delta-sigma ADC pin functions by using two adjacent FPGA pins and having one provide negative feedback through a resistor. We'd use the flipflops in the pin circuits to make the turn-around minimal. I'm sure more could be done, as well. We'd probably have to sell that board for upwards of $600. It could enable open Verilog development of Prop III. I suppose we could get all the Verilog worked out for Prop III under MIT license terms. Anyone could use the core, or even make an ASIC, but we would use it in as ASIC with friendly I/O pins.
I'd wait to have this discussion until after P2 has been fielded for a bit. Considering how much has changed from P1, who knows what we'll want in the P3!
This night I have updated my DE0-Nano and the included demos work fine.
Getting my own old code to work on the new emulation turned out to be not so easy, until I find all the little and big changes in the instructions and register settings. But I was able to port already 3 or 4 of my demos.
What I not get to work is the fractal demo, after some debugging I think the DIV32 works not like it should.
I made some tests and it looks like the DIV32 works not correct with a negativ Nominator, the resulting quotient is always 0 in this case (and the remainder has the value of the nominator).
I tought I can use MUL64 for now, but PNUT does not even accept this Mnemonic.
Pins 92..95 don't do anything and read '0'. We could make them read something else, of course. Pins 96..127 are the inter-cog I/O's which also connect to ALL the peripherals, so they are available for all kinds of applications inside the chip.
Maybe these could simply be wired back to themselves, so that each cog has 4 bits they can read/write to (e.g. for inter-task signaling, simple state machines, etc).
Or maybe just wire them up to output noise, which can be very useful in DSP.
Demos working on my Nano. I am using the quartus 12.2.1 programmer for configuration. The programmer works a little different now? You have to select the target in the lower window as well as clicking on the check box.
Chip, when you have absolutely nothing else to do:)… could you post your serial out version of the Balls demo? I'm working on a Prop2Mac serial terminal with graphics… it works on my Mac, but I don't have a source for the 1M baud rate and I am not totally confident that the higher serial rates are going to be supported. And I have to add decoding for your picture files. Currently Prop2Mac is configured to work with 8 bit gray text files only.
I'm working on getting the DE2-115s in stock, along with our adapter board.
Ken Gracey
Hi Ken,
If you end up making another DE2-115 adapter board, could you make it a little wider with mounting holes so it can be bolted to the DE2-115 using the mounting holes on either side of the connector? I'm always nervous now that it will come unplugged especially if I have cables plugged into it.
I second this. Mine is propped up by some odd things I found to fit in there. The whole thing is a little fiddly.
@Ale: I've not been able to attempt that yet. This short week is rough. Almost done though. I would appreciate seeing your changes, when you are inclined to share them.
I was looking today at the Altera Cyclone V devices. The biggest one is the A9 and it has 301k LE's for $220.00 from Digi-Key. We could build a board for that that would be good for 8 cogs and Prop III development, too. We could recreate the delta-sigma ADC pin functions by using two adjacent FPGA pins and having one provide negative feedback through a resistor. We'd use the flipflops in the pin circuits to make the turn-around minimal. I'm sure more could be done, as well. We'd probably have to sell that board for upwards of $600. It could enable open Verilog development of Prop III. I suppose we could get all the Verilog worked out for Prop III under MIT license terms. Anyone could use the core, or even make an ASIC, but we would use it in as ASIC with friendly I/O pins.
Chip, instead of delta-sigma it could be possible to use SAR ADC? There are currently several 16 bit ADC at 1 MSPS that interfaces using SPI. I think that it will be a incredibly userful IC (in the industrial area) if you can get something as easy to use as first P1 with low pin count and small package (40 or 48 pins VQFP?) and several 16 bit ADC at high speed ( around 1 MSPS).
This night I have updated my DE0-Nano and the included demos work fine.
Getting my own old code to work on the new emulation turned out to be not so easy, until I find all the little and big changes in the instructions and register settings. But I was able to port already 3 or 4 of my demos.
What I not get to work is the fractal demo, after some debugging I think the DIV32 works not like it should.
I made some tests and it looks like the DIV32 works not correct with a negativ Nominator, the resulting quotient is always 0 in this case (and the remainder has the value of the nominator).
I tought I can use MUL64 for now, but PNUT does not even accept this Mnemonic.
Andy
You found two problems, all right, and would have found a third. I had some Spin vs PASM naming conflict over DIV64/DIV64U, so it wasn't assembling. I also had documented DIV64D/DIV64/DIV64U wrong - the DIV64D needs to be executed before DIV64/DIV64U. And, I had a logic error involving negation in the divider that was causing DIV32 to not work right. I've got these fixed and I'm now recompiling the DE0/DE2 configuration files. I'll post a new .zip at the top of this thread in an hour, or so. I don't know how many times I'll have to recompile for the DE0-Nano before I get a fast fit. Thanks for finding these problems, Andy!
Chip, instead of delta-sigma it could be possible to use SAR ADC? There are currently several 16 bit ADC at 1 MSPS that interfaces using SPI. I think that it will be a incredibly userful IC (in the industrial area) if you can get something as easy to use as first P1 with low pin count and small package (40 or 48 pins VQFP?) and several 16 bit ADC at high speed ( around 1 MSPS).
Well, that's a bummer. Quartus goes from free to $3k/year. That won't work for everybody. We'll have to scale back.
If you end up making another DE2-115 adapter board, could you make it a little wider with mounting holes so it can be bolted to the DE2-115 using the mounting holes on either side of the connector? I'm always nervous now that it will come unplugged especially if I have cables plugged into it.
Demos working on my Nano. I am using the quartus 12.2.1 programmer for configuration. The programmer works a little different now? You have to select the target in the lower window as well as clicking on the check box.
Chip, when you have absolutely nothing else to do:) could you post your serial out version of the Balls demo? I'm working on a Prop2Mac serial terminal with graphics it works on my Mac, but I don't have a source for the 1M baud rate and I am not totally confident that the higher serial rates are going to be supported. And I have to add decoding for your picture files. Currently Prop2Mac is configured to work with 8 bit gray text files only.
Thanks
Rich
Here you go. This version outputs the R,G,B bytes at 1Mbaud. There is an initial long wait to allow the PC to get its baud rate adjusted. I suppose you'd need to do something similar for the Mac, or maybe wait for a pushbutton to start sending the bytes.
Chip - the DE0-Nano has 22K LE's, and is tight for one cog
Above board (for $300) has 110K LE's - should leave room left over after four cogs
Also
- the VGA has R8G8B8 resistor dac's, should be possible to map P2 vga/component video onto it in verilog
- there is an on-board uSD socket
- there is an on-board QSPI flash chip
- LED's, switches, audio
- on board USB-UART port (looks like FTDI) possibly no need for PropPlug
-> possibly no need for add-on board?
Only issue: on-board DRAM is DDR3, so you'd have to make it behave like SDRAM in verilog.
Edit: (Ray had a good point about lack of .1" headers)
Bill,
That P160 board lacks easy expansion headers.
Here is one previously discussed which fits the bill (pardon the pun) much better for $179 but only 77K LEs - this would only make a 2-3 cog version (maybe 4 with only 1 cog w video). http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=830
When Chip has time, I would like him to look and see if he could also support this board for P2 testing with 2 cogs and >32KB hub ram. I/we can make the daughter board to save Parallax the trouble if they wish. Maybe they could get a discount on this board from Terasic for us.
For a special purpose board (for P3 development), as far as the software is concerned, I would expect that a deal could be done with Altera. If not, then I am sure Xilinx would come to the party.
Anyway, if there is more discussion on FPGA boards, can we start a new thread and keep this for the new P2 code please.
I just updated the .zip in the first post of this thread.
The new PNut.exe handles DIV64/DIV64U properly.
The new FPGA configuration files fix the DIV32 bug.
New Prop2_Docs.txt properly documents DIV64/DIV64U now.
Chip,
Could you please consider adding this instruction? It would help immensely with my P2 USB FS bit-banging. Hopefully its only a couple of lines of Verilog.
Currently I just don't have quite enough time to do the bit-bang receive loop, so I am stuck.
GETXP [#]D [WZ],[WC] ' (similar to GETP & GETNP) where D = PINx (pin no 0..127)
C = C XOR PINx
Z = ! ( PINx OR (PINx XOR #1)) 'ie ZERO if both pin pairs are ZERO, else NON-ZERO
Note1: PINx and (PINx XOR #1) are a pair of adjacent pins.
... If [#]D = PINx = nnnnnn0 (even) then (PINx XOR #1) = nnnnnn1; If [#D] = nnnnnn1 (odd) then (PINx XOR #1) = nnnnnn0
Background: In USB we have 2 I/O pins (D- and D+). NRZI is used.
When LS (low speed 1.5Mbps) is used, the "J" state pin is the D- pin and the "K" state pin is the D+ pin.
When FS (full speed 12Mbps) is used, J & K state pins are swapped. ie "J" is now on D+ and "K" is now on D-.
So for software, it is easier to refer to the J & K state pins and swap the D- and D+ pins over.
While bit-banging on receive, we need to test for a K state change (actually both J & K change and are opposite) and we also need to check for the special condition SE0 (both J & K = 0).
As a way of accumulating the condition of bit-stuffing we get each data bit and XOR it with the previous data bit. By holding the previous data bit in the C flag, and XORing it with the new data bit and placing the result back into the C flag, we can now accumulate the data bits by shifting the C flag into the data bits register. For the bit-stuffing counter, we shift the C flag into a 32-bit counter 6 times (rcl counter,#6 wz). If the resulting 32-bit counter = 0 then we need to unstuff the next bit.
The above instruction would replace 4 x P2 instructions.
An opcode similar to GETP & GETNP (say S=x11100011 which is the next available after CACHEX) could work.
BTW I don't need pnut modified for now - I can hard code into a long to get me running.
-
GETXP [#]D [WZ],[WC] ' (similar to GETP & GETNP) where D = PINx (pin no 0..127)
C = C XOR PINx
Z = ! ( PINx OR (PINx XOR #1)) 'ie ZERO if both pin pairs are ZERO, else NON-ZERO
Note1: PINx and (PINx XOR #1) are a pair of adjacent pins.
... If [#]D = PINx = nnnnnn0 (even) then (PINx XOR #1) = nnnnnn1; If [#D] = nnnnnn1 (odd) then (PINx XOR #1) = nnnnnn0
The Logic side is obviously fairly easy, the new pin-pair pathway could need a respin of the pin-mux handling.
Chip,
Could you please consider adding this instruction? It would help immensely with my P2 USB FS bit-banging. Hopefully its only a couple of lines of Verilog.
Currently I just don't have quite enough time to do the bit-bang receive loop, so I am stuck.
GETXP [#]D [WZ],[WC] ' (similar to GETP & GETNP) where D = PINx (pin no 0..127)
C = C XOR PINx
Z = ! ( PINx OR (PINx XOR #1)) 'ie ZERO if both pin pairs are ZERO, else NON-ZERO
Note1: PINx and (PINx XOR #1) are a pair of adjacent pins.
... If [#]D = PINx = nnnnnn0 (even) then (PINx XOR #1) = nnnnnn1; If [#D] = nnnnnn1 (odd) then (PINx XOR #1) = nnnnnn0
Background: In USB we have 2 I/O pins (D- and D+). NRZI is used.
When LS (low speed 1.5Mbps) is used, the "J" state pin is the D- pin and the "K" state pin is the D+ pin.
When FS (full speed 12Mbps) is used, J & K state pins are swapped. ie "J" is now on D+ and "K" is now on D-.
So for software, it is easier to refer to the J & K state pins and swap the D- and D+ pins over.
While bit-banging on receive, we need to test for a K state change (actually both J & K change and are opposite) and we also need to check for the special condition SE0 (both J & K = 0).
As a way of accumulating the condition of bit-stuffing we get each data bit and XOR it with the previous data bit. By holding the previous data bit in the C flag, and XORing it with the new data bit and placing the result back into the C flag, we can now accumulate the data bits by shifting the C flag into the data bits register. For the bit-stuffing counter, we shift the C flag into a 32-bit counter 6 times (rcl counter,#6 wz). If the resulting 32-bit counter = 0 then we need to unstuff the next bit.
The above instruction would replace 4 x P2 instructions.
An opcode similar to GETP & GETNP (say S=x11100011 which is the next available after CACHEX) could work.
BTW I don't need pnut modified for now - I can hard code into a long to get me running.
-
We can do this!!!
Do you need the mux path to be two pins wide? Let's get this hashed out and I'll implement it. Sounds good!
Do you need the mux path to be two pins wide? Let's get this hashed out and I'll implement it. Sounds good!
I can answer.. Yes, the MUX is for pin pairs, and I'd write it like this
Z = ! ( PIN(x) OR PIN(y)) ( pin number y= pin number x XOR #1)
That may sound a little complex to MUX, but it means the x can define which of D,D- to use as data bit.
In words, Upper mux bits choose a Pin-pair set and the lower mux bit selects which of those 2 bits, feeds C = C XOR PINx (Z uses both pins, order does not matter)
( The mind boggles at how many USB ports might spawn out of P2...)
At 80MHz there is only 6.7 clocks (instructions) per bit.
But P2 will run >160MHz, maybe 200MHz
Does this time budget include edge-re-sync ?
Not currently. Provided the packet (end point block) size is restricted, it is unnecessary. There may be time by unravelling sections.
FYI There is a P1 implementation for USB FS but requires 2 cogs just to perform the bit-banging receive (deserialiser) only.
I am not shooting for fully compliant, just working compliant code.
The Logic side is obviously fairly easy, the new pin-pair pathway could need a respin of the pin-mux handling.
I don't see the need for any new pathways. There are current instructions that take up to 32 input pins and can and/or/etc with them. There are single pin (1..127) instructions also.
The only part necessary AFAIK is to disregard the lowest pin bit, and read both pins with the lowest bit both 0 & 1. perhaps I did not explain this part adequately.
So for example, if you had D=43 then you would use Pin#43 for the C part, and both pins 42 & 43 for the Z part. If you had D=42, then Pin#42 would be the C part and both 42 & 43 for the C part.
Chip has also said there is provision for complementary pin pairs in the pin matrix, so this might also be useful.
BTW I don't mind if there was a restriction to a single group of 32 pins, say fixed to the PINA/OUTA/DIRA group.
Another quite useful instruction could be
GETPPZC [#]D [WZ],[WC] where [#]D specifies a pin pair
If the pin pair is even (lowest pin specified) the pin pair [1:0] are put into Z/C via WZ/WC (similar to SETZC)
If the pin pair is odd (highest pin specified) the pin pair [0:1] are put into Z/C via WZ/WC This immediately decodes the 4 possible states into Z/C.
This could be implemented using the existing GETP/GETNP instructions where [#]D specifies either a pin or pin pair
If WZ=1 or WC=1 then it functions as it does now (puts the pin value into Z and/or C)
If WZ/WC=00 (an otherwise useless instruction) then it functions as... ... uses a pin pair and if the pin pair is even (lowest pin specified) the pin pair [1:0] are put into Z/C
... uses a pin pair and if the pin pair is odd (highest pin specified) the pin pair [0:1] are put into Z/C
pnut could use a pseudo op for the pair function as GET[N]PP to avoid confusion
And for the reverse
SETPPZC [#]D [WZ],[WC]
This could be implemented usingSETP/SETPC/SETPNC/SETPZ/SETPNZ by utilising the WC/WZ effects... SETPZ/SETPC and SETNPZ/SETNPC could be each combined and now use the WZ/WC flags If WZ=1 or WC=1 then it functions as it does now (puts the Z or C value onto the pin)
If WZ/WC=00 (an otherwise useless instruction) then it functions as...
... uses a pin pair and if the pin pair is even (lowest pin specified) then Z/C are put onto the pin pair [1:0]
... uses a pin pair and if the pin pair is odd (highest pin specified) then Z/C are put onto the pin pair [0:1]
pnut could use a pseudo op for the pair function as SET[N]PP to avoid confusion
I think something similar could be done with OFFP/NOTP/CLRP/SETP using the WZ/WC
Do you need the mux path to be two pins wide? Let's get this hashed out and I'll implement it. Sounds good!
Yes Chip. jmg has explained it better although you can also see from my last post further explanation (and more work maybe, but the logic is identical to what we need in the Z bit). By utilising the pair and the odd/even we can actually swap the inputs over simply.
I am finding the CLRP/SETP/OFFP/NOTP confusing/misleading. I went looking for a SETDIR or SETPDIR instruction to set the pin 1..127 direction.
Perhaps its just me, but thought I would mention it anyway. See post below for further info.
Can anyone see my bug???
The ">" is output, but then it hangs on the waitcnt so "A" is never seen.
I am getting a ~53s delay between each set ">", "AB", "AB". Is there something I am missing with the new 64bit waitcnt?
DAT
orgh $00E00 ' start of hub ram
org 0
Entry
getcnt waitx
add waitx, delay5s ' 5 secs
waitcnt waitx, delay1s
setsera _serenable, _period
CLRP #_txpin ' make txpin an output, SERA drives it high
nop
nop
nop
serouta #">"
:again
waitcnt waitx, delay1s
serouta #"A"
serouta #"B"
jmp #:again
' txid rxid mask r t en rxpin# en txpin#
_serenable long 00_0000_0000_0_0_10_0000000_10_0000000 | (_rxpin<<9) | _txpin
_period long _bitrate
waitx long 0
delay500ms long _xinfreq / 2 ' 0.5 sec
delay1s long _xinfreq * 1 ' 1s
delay5s long _xinfreq * 5 ' 5 sec
Brian,
The first waitcnt works. It is the subsequent waitcnt loop that is not working properly.
Seems like the second waitcnt waits a complete 32bit loop + delay1s each time it executes as though there is a "1" being included in the upper 32bits of the extended counter.
Definitely need to do some more testing.
Comments
File deleted.
I'd wait to have this discussion until after P2 has been fielded for a bit. Considering how much has changed from P1, who knows what we'll want in the P3!
This night I have updated my DE0-Nano and the included demos work fine.
Getting my own old code to work on the new emulation turned out to be not so easy, until I find all the little and big changes in the instructions and register settings. But I was able to port already 3 or 4 of my demos.
What I not get to work is the fractal demo, after some debugging I think the DIV32 works not like it should.
I made some tests and it looks like the DIV32 works not correct with a negativ Nominator, the resulting quotient is always 0 in this case (and the remainder has the value of the nominator).
I tought I can use MUL64 for now, but PNUT does not even accept this Mnemonic.
Andy
Maybe these could simply be wired back to themselves, so that each cog has 4 bits they can read/write to (e.g. for inter-task signaling, simple state machines, etc).
Or maybe just wire them up to output noise, which can be very useful in DSP.
Chip, when you have absolutely nothing else to do:)… could you post your serial out version of the Balls demo? I'm working on a Prop2Mac serial terminal with graphics… it works on my Mac, but I don't have a source for the 1M baud rate and I am not totally confident that the higher serial rates are going to be supported. And I have to add decoding for your picture files. Currently Prop2Mac is configured to work with 8 bit gray text files only.
Thanks
Rich
If you end up making another DE2-115 adapter board, could you make it a little wider with mounting holes so it can be bolted to the DE2-115 using the mounting holes on either side of the connector? I'm always nervous now that it will come unplugged especially if I have cables plugged into it.
@Ale: I've not been able to attempt that yet. This short week is rough. Almost done though. I would appreciate seeing your changes, when you are inclined to share them.
The problem with A9 is that there is no free Quartus edition ( See this post : http://forums.parallax.com/showthread.php/150588-Big-update-for-DE2-115-and-DE0-Nano-users-w-add-on-boards?p=1211750&viewfull=1#post1211750 )
Chip, instead of delta-sigma it could be possible to use SAR ADC? There are currently several 16 bit ADC at 1 MSPS that interfaces using SPI. I think that it will be a incredibly userful IC (in the industrial area) if you can get something as easy to use as first P1 with low pin count and small package (40 or 48 pins VQFP?) and several 16 bit ADC at high speed ( around 1 MSPS).
You found two problems, all right, and would have found a third. I had some Spin vs PASM naming conflict over DIV64/DIV64U, so it wasn't assembling. I also had documented DIV64D/DIV64/DIV64U wrong - the DIV64D needs to be executed before DIV64/DIV64U. And, I had a logic error involving negation in the divider that was causing DIV32 to not work right. I've got these fixed and I'm now recompiling the DE0/DE2 configuration files. I'll post a new .zip at the top of this thread in an hour, or so. I don't know how many times I'll have to recompile for the DE0-Nano before I get a fast fit. Thanks for finding these problems, Andy!
Well, that's a bummer. Quartus goes from free to $3k/year. That won't work for everybody. We'll have to scale back.
That's an interesting idea. It kind of makes me want to put a true random number generator in the hub and feed it back through those 'pins'.
Here you go. This version outputs the R,G,B bytes at 1Mbaud. There is an initial long wait to allow the PC to get its baud rate adjusted. I suppose you'd need to do something similar for the Mac, or maybe wait for a pushbutton to start sending the bytes.
balls_tx.spin
web edition supports "Cyclone® III, Cyclone IV, and Cyclone V FPGAs (excluding 5CEA9, 5CGXC9, and 5CGTD9)"
Looks like the 150k LE SCEA7 is supported. Should be enough for four cogs.
Don't know about pricing though.
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=816&PartNo=1
(I am sure I remember someone mentioning this board earlier, but I don't remember who)
http://www.digikey.com/product-search/en?mpart=P0160 $310.80
Chip - the DE0-Nano has 22K LE's, and is tight for one cog
Above board (for $300) has 110K LE's - should leave room left over after four cogs
Also
- the VGA has R8G8B8 resistor dac's, should be possible to map P2 vga/component video onto it in verilog
- there is an on-board uSD socket
- there is an on-board QSPI flash chip
- LED's, switches, audio
- on board USB-UART port (looks like FTDI) possibly no need for PropPlug
-> possibly no need for add-on board?
Only issue: on-board DRAM is DDR3, so you'd have to make it behave like SDRAM in verilog.
Edit: (Ray had a good point about lack of .1" headers)
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=67&No=322&PartNo=2
breaks out HSMC connectors to 3x 40pin (2x20) .1" pitch connectors, $55
The board Ray points out is nice as well, especially if it fit 3 cogs. Verilog could map the video DAC's to DVI.
From Scratch:
$176 qty.40 price for 149K LE A7, should fit about six cogs
http://www.digikey.com/product-detail/en/5CEBA7F27C8N/5CEBA7F27C8N-ND/3879483
That P160 board lacks easy expansion headers.
Here is one previously discussed which fits the bill (pardon the pun) much better for $179 but only 77K LEs - this would only make a 2-3 cog version (maybe 4 with only 1 cog w video).
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=830
When Chip has time, I would like him to look and see if he could also support this board for P2 testing with 2 cogs and >32KB hub ram. I/we can make the daughter board to save Parallax the trouble if they wish. Maybe they could get a discount on this board from Terasic for us.
For a special purpose board (for P3 development), as far as the software is concerned, I would expect that a deal could be done with Altera. If not, then I am sure Xilinx would come to the party.
Anyway, if there is more discussion on FPGA boards, can we start a new thread and keep this for the new P2 code please.
The new PNut.exe handles DIV64/DIV64U properly.
The new FPGA configuration files fix the DIV32 bug.
New Prop2_Docs.txt properly documents DIV64/DIV64U now.
Chip,
Could you please consider adding this instruction? It would help immensely with my P2 USB FS bit-banging. Hopefully its only a couple of lines of Verilog.
Currently I just don't have quite enough time to do the bit-bang receive loop, so I am stuck.
GETXP [#]D [WZ],[WC] ' (similar to GETP & GETNP)
where
D = PINx (pin no 0..127)
C = C XOR PINx
Z = ! ( PINx OR (PINx XOR #1)) 'ie ZERO if both pin pairs are ZERO, else NON-ZERO
Note1: PINx and (PINx XOR #1) are a pair of adjacent pins.
... If [#]D = PINx = nnnnnn0 (even) then (PINx XOR #1) = nnnnnn1; If [#D] = nnnnnn1 (odd) then (PINx XOR #1) = nnnnnn0
Background: In USB we have 2 I/O pins (D- and D+). NRZI is used.
When LS (low speed 1.5Mbps) is used, the "J" state pin is the D- pin and the "K" state pin is the D+ pin.
When FS (full speed 12Mbps) is used, J & K state pins are swapped. ie "J" is now on D+ and "K" is now on D-.
So for software, it is easier to refer to the J & K state pins and swap the D- and D+ pins over.
While bit-banging on receive, we need to test for a K state change (actually both J & K change and are opposite) and we also need to check for the special condition SE0 (both J & K = 0).
As a way of accumulating the condition of bit-stuffing we get each data bit and XOR it with the previous data bit. By holding the previous data bit in the C flag, and XORing it with the new data bit and placing the result back into the C flag, we can now accumulate the data bits by shifting the C flag into the data bits register. For the bit-stuffing counter, we shift the C flag into a 32-bit counter 6 times (rcl counter,#6 wz). If the resulting 32-bit counter = 0 then we need to unstuff the next bit.
The above instruction would replace 4 x P2 instructions.
An opcode similar to GETP & GETNP (say S=x11100011 which is the next available after CACHEX) could work.
BTW I don't need pnut modified for now - I can hard code into a long to get me running.
-
What MHz do you need to make the time ?
Does this time budget include edge-re-sync ?
The Logic side is obviously fairly easy, the new pin-pair pathway could need a respin of the pin-mux handling.
We can do this!!!
Do you need the mux path to be two pins wide? Let's get this hashed out and I'll implement it. Sounds good!
I can answer.. Yes, the MUX is for pin pairs, and I'd write it like this
Z = ! ( PIN(x) OR PIN(y)) ( pin number y= pin number x XOR #1)
That may sound a little complex to MUX, but it means the x can define which of D,D- to use as data bit.
In words, Upper mux bits choose a Pin-pair set and the lower mux bit selects which of those 2 bits, feeds C = C XOR PINx (Z uses both pins, order does not matter)
( The mind boggles at how many USB ports might spawn out of P2...)
But P2 will run >160MHz, maybe 200MHz Not currently. Provided the packet (end point block) size is restricted, it is unnecessary. There may be time by unravelling sections.
FYI There is a P1 implementation for USB FS but requires 2 cogs just to perform the bit-banging receive (deserialiser) only.
I am not shooting for fully compliant, just working compliant code. I don't see the need for any new pathways. There are current instructions that take up to 32 input pins and can and/or/etc with them. There are single pin (1..127) instructions also.
The only part necessary AFAIK is to disregard the lowest pin bit, and read both pins with the lowest bit both 0 & 1. perhaps I did not explain this part adequately.
So for example, if you had D=43 then you would use Pin#43 for the C part, and both pins 42 & 43 for the Z part. If you had D=42, then Pin#42 would be the C part and both 42 & 43 for the C part.
Chip has also said there is provision for complementary pin pairs in the pin matrix, so this might also be useful.
BTW I don't mind if there was a restriction to a single group of 32 pins, say fixed to the PINA/OUTA/DIRA group.
Edit: File deleted.
GETPPZC [#]D [WZ],[WC]
where [#]D specifies a pin pair
If the pin pair is even (lowest pin specified) the pin pair [1:0] are put into Z/C via WZ/WC (similar to SETZC)
If the pin pair is odd (highest pin specified) the pin pair [0:1] are put into Z/C via WZ/WC
This immediately decodes the 4 possible states into Z/C.
This could be implemented using the existing GETP/GETNP instructions
where [#]D specifies either a pin or pin pair
If WZ=1 or WC=1 then it functions as it does now (puts the pin value into Z and/or C)
If WZ/WC=00 (an otherwise useless instruction) then it functions as...
... uses a pin pair and if the pin pair is even (lowest pin specified) the pin pair [1:0] are put into Z/C
... uses a pin pair and if the pin pair is odd (highest pin specified) the pin pair [0:1] are put into Z/C
pnut could use a pseudo op for the pair function as GET[N]PP to avoid confusion
And for the reverse
SETPPZC [#]D [WZ],[WC]
This could be implemented usingSETP/SETPC/SETPNC/SETPZ/SETPNZ by utilising the WC/WZ effects...
SETPZ/SETPC and SETNPZ/SETNPC could be each combined and now use the WZ/WC flags
If WZ=1 or WC=1 then it functions as it does now (puts the Z or C value onto the pin)
If WZ/WC=00 (an otherwise useless instruction) then it functions as...
... uses a pin pair and if the pin pair is even (lowest pin specified) then Z/C are put onto the pin pair [1:0]
... uses a pin pair and if the pin pair is odd (highest pin specified) then Z/C are put onto the pin pair [0:1]
pnut could use a pseudo op for the pair function as SET[N]PP to avoid confusion
I think something similar could be done with OFFP/NOTP/CLRP/SETP using the WZ/WC
Thanks heaps Chip!
Was
JMPTASK #address,#mask
is now
JMPTASK #mask,#address
Ozpropdev
JMPTASK D/#,S/# force PC's in mask D/# to address S/#
Perhaps its just me, but thought I would mention it anyway.
See post below for further info.
Can anyone see my bug???
The ">" is output, but then it hangs on the waitcnt so "A" is never seen.
I am getting a ~53s delay between each set ">", "AB", "AB". Is there something I am missing with the new 64bit waitcnt?
I just tried PASSCNT instead and it works,
WAITCNT did not.
Weird???
The first waitcnt works. It is the subsequent waitcnt loop that is not working properly.
Seems like the second waitcnt waits a complete 32bit loop + delay1s each time it executes as though there is a "1" being included in the upper 32bits of the extended counter.
Definitely need to do some more testing.