I saw a P1 thread recently where Phil was able to get a 32kHz crystal to run using two Prop1 pins. Maybe that would help USB if it worked with a 12 MHz crystal and 2 P2 pins?
The 16-bit NCO affords good accuracy, though there is short-term jitter due to roll over differences.
I saw a P1 thread recently where Phil was able to get a 32kHz crystal to run using two Prop1 pins. Maybe that would help USB if it worked with a 12 MHz crystal and 2 P2 pins?
That circuit is somewhat hairy, & may not work in P2, but USB just needs >=48MHz & can NCO to 12MHz. 12N is likely easier to debug.
Well, the old game, Rocky's Boots and Robot Odyssey both took a visual approach to this problem. You had a goal, sensors, logic units, etc... hook em up, and then let the robot loose to see what the circuit did.
All that is needed is 2d. Take the diagram Andy made and make it more detailed to see bits move and see state over time, and it gets translated to verilog... input bitstream, through circuit, and output bits are shown. It all happens in a little sandbox, with clock, and virtual test gear all ready to go and display.
Good for something like this. When the code is done, block working, it gets dropped into the larger project, and optimized at that time.
IMHO, doing that would make a fine thesis type research project for someone.
Re: NCO
We could add a tuning register to deliver more precision and or better control over error and when it happens or needs to be compensated for. Might not be a bad idea for the streamer NCO to improve video signals as well.
Re: NCO
We could add a tuning register to deliver more precision and or better control over error and when it happens or needs to be compensated for. Might not be a bad idea for the streamer NCO to improve video signals as well.
I'm not following ?
NCO above is used in the P1 sense of meaning an Adder-Based-Divider.
This gives finer divisions, but at the expense of jitter.
USB resyncs the adder on edges, and the same edge-reset can usefully be applied from Start-Bit-Edge for UARTS to allow Fractional Baud minus inter-byte jitter.
Your mention of Tuning Resistor seems to relate to Trim of the RCFAST, or Control of the PLL VCO ?
I've not seen predicted numbers on RCFAST drift and stability, or if one of the DACS can be MUX'd across to the VCO, but some user access to Trim can always be useful.
Joining a DAC to the VCO seems a low tech way to use what exists already ?
Most small MCUs with Trim these days, (even 30c ones) also start with a reasonable calibration of around 2% which is based on a carefully crafted Oscillator cell.
(plus a factory calibrate)
I'm not sure if Parallax expect to approach that ?
A higher price part like P2, I'd personally expect to see more often used at the other end of the precision scale - eg connected to a TCXO.
In that context it certainly should spec a Clipped Sine In, and I think the PLL needs small improving to give /M & /N VCO control.
If P2 plans Crystal Cap choices like P1, finer control there would allow better Crystal Oscillator calibrate.
I don't mean resistor. In my exploration of how NCO devices work, I found some of them have an additional tuning register that works in addition to the one being used as the accumulator. There are two values involved in determining the frequency.
I'm about to head out, and will post a link later when I return.
Well, the old game, Rocky's Boots and Robot Odyssey both took a visual approach to this problem. You had a goal, sensors, logic units, etc... hook em up, and then let the robot loose to see what the circuit did.
All that is needed is 2d. Take the diagram Andy made and make it more detailed to see bits move and see state over time, and it gets translated to verilog... input bitstream, through circuit, and output bits are shown. It all happens in a little sandbox, with clock, and virtual test gear all ready to go and display.
Good for something like this. When the code is done, block working, it gets dropped into the larger project, and optimized at that time.
IMHO, doing that would make a fine thesis type research project for someone.
Re: NCO
We could add a tuning register to deliver more precision and or better control over error and when it happens or needs to be compensated for. Might not be a bad idea for the streamer NCO to improve video signals as well.
Yes, something graphical, but not simplistic. I love the idea of a graphical sandbox that just runs, and then when you're satisfied, out pops the Verilog. That could be a worthy endeavor.
Typing and syntax is efficient for most things, but defining concurrent hardware logic with state machines needs something more graphical. All graphical things I've seen, though, impose too many limitations. How to craft hardware like you would mold clay is hard to figure out, but it would sure be worth the trouble. Text is just so tedious in this case.
Text is fine for procedural languages, but too one-dimensional for lots of concurrent logic expressions. After all, procedural languages ARE one-dimensional because of their inherent sequentiality.
Hundreds of concurrent hardware expressions need to be broken up into their own little blobs, without concern for order. Having them in some order implies a need for order, which is misleading. Verilog ASIC files actually must be ordered to avoid use-before-declaration problems, which should not matter, but do due to the legacy tools.
I don't mean resistor. In my exploration of how NCO devices work, I found some of them have an additional tuning register that works in addition to the one being used as the accumulator. There are two values involved in determining the frequency.
I'm about to head out, and will post a link later when I return.
Ah, ok.
Analog trim has a place, but Digital control is usually lower cost.
The NCO-as-baud also infers some preload phase value, as you want to both phase and frequency lock to the Trigger Edge. I think that may be ok as just Adder >> 1 ?
LabVIEW kindof lets you do this already...
It takes your graphical labview code and programs FPGA with result.
Costs a fortune and takes forever to compile though...
IMHO, what Chip is talking about may be simpler and more easily used. I wish I were at a place to attempt it. For programming purposes, as described here, the simulation would not need to run all that fast.
LabVIEW kindof lets you do this already...
It takes your graphical labview code and programs FPGA with result.
Costs a fortune and takes forever to compile though...
Testing can often take longer than design, which is why I like the direction of http://www.myhdl.org/
This is Python front end, that allows easy testing, and generates HDL.
However, USB is something of a special case, and likely needs testing done on real USB streams, and this should be possible in small steps, in sniffer mode on an active USB link.
BTW: Is Chip saying that he's dropping state machine stuff in order to add USB support?
I think yes.
Chip, is that so? If so, I understand if there isn't the space for it. But if there is, it would still be nice to have the simplified version that I laid out (or something like it). It would still allow some small level of flexibility for future protocol/signaling support. Frankly, I thought the state machines were one of the neatest features of the smart pins/cells, at least in concept.
BTW: Is Chip saying that he's dropping state machine stuff in order to add USB support?
I think yes.
Chip, is that so? If so, I understand if there isn't the space for it. But if there is, it would still be nice to have the simplified version that I laid out (or something like it). It would still allow some small level of flexibility for future protocol/signaling support. Frankly, I thought the state machines were one of the neatest features of the smart pins/cells, at least in concept.
We could improve the programmable modes and put them into even pins, while odd pins can have USB modes.
Chip,
Is it possible to have the input pin(s) routed via Smart Pin logic and then to back to the input into the INA so that a normal read of the pins using INA could yield the input pin(s) as modified by the smart pin logic without needing to actually read the smart pins???
Just a single mux & bit to select normal pin input, or from the smart pin logic, fed to the respective INA pin.
To understand the Custom State Maschine modes, I tried to draw a schematic for the whole logic for both modes.
It's a bit simplified to not make it look too complicated ;-)
evan,
I don't understand your comment. Surely the input to the smart in logic is already there. In fact it can come from any input pin within +/- 3 pins.
All 64 inputs go to a ring that can be read by any cog using the INA type instruction (eg MOV xxx,INA).
What I am asking is it possible for those 64 rings to have a mux inserted between them that can select the normal/default actual input pin (think it is registered in the P2), or select the output from the Smart Pins ? ie each input pin could either be the actual pin, or via the smart pin logic (one smart pin per input/output pin).
Sorry, by INx I meant the Cog input, not the Smartpin input. Each Cog that wants to talk any Smartpin has to twiddle a number of lines. The INx bit line is one part of that mechanism. So,when a Smartpin is activated by a particular Cog, then that Cog's input bit is already consumed in managing the Smartpin of interest.
Sorry, by INx I meant the Cog input, not the Smartpin input. Each Cog that wants to talk any Smartpin has to twiddle a number of lines. The INx bit line is one part of that mechanism. So,when a Smartpin is activated by a particular Cog, then that Cog's input bit is already consumed in managing the Smartpin of interest.
I was under the impression that it was still possible, even when smart pins were activated, to write to the OUTx port/pin and read the INx port/pin. When wanting to talk to the smart pins, special read and write instructions were used which toggle the INx/OUTx pins at 2x the speed.
Dunno, but that mux you're asking for must already exist for non-Smartpin reads anyway. So it comes down to whether an active Smartpin hogs it's Cog input or not.
- It raises IN on receiving a byte, end of packet, or a constant DP/DM line change spanning seven bit periods.
- The top 5 bits of the 32-bit Z register hold a counter that increments on every update (IN gets raised).
- The bottom three 9-bit fields hold the latest three messages, with the bottom field holding the most recent. The older messages scroll left on each update.
- As soon as the last byte is received (SE0 sensed), an IDLE event is immediately posted so that the cog can hurry up and respond to the packet.
This has been challenging. There's got to be a better way to compose hardware than all this tedious thinking and typing. The brain would be so much more satisfied with some kind of 3D image of a machine, or something.
Hi Chip, sounds like you are going well despite being tasked by it.
Couple of points and possible optimizations.
1) USB2.0 spec section 8.1 "Bits are sent out onto the bus least-significant bit (LSb) first, followed by the next LSb, through to the most- significant bit (MSb) last. "
As USB sends LSB first any reassembled bit received is best shifted right within the 9 bit fields of Z. Otherwise a REV instruction will be required for every single arriving byte and that only adds overhead to the time critical COG loop. I guess this messes up things a bit for your Z as you currently have it shifting left. We may not need to keep prior bytes if that feature is tricky to retain.
2) I think you should flip the bit8 value meaning to have 0 mean normal data bytes and 1 meaning the exceptional cases with pure bus states. This potentially allows MOVS type of actions to be used on normal data byte cases (perhaps more useful for transmit) and the bit8 will already be cleared for you. Might save another instruction.
3) The bit destuffing happens after the NRZI decoding on receive, and bit stuffing before NRZI on transmit. So if stuffing is in HW it means we need NRZI in HW too (I'm not sure if it has been called out explicitly before). See 7.1.8. We don't send J's and K's we send 0,1 and they are converted to J/Ks by the NRZI process after the serial bit stuffing process, reverse on receive. You probably are aware of this, but just making sure.
4) From the spec again, section 7.1.7.4.1
"The start of a packet (SOP) is signaled by the originating port by driving the D+ and D- lines from the Idle state to the opposite logic level (K state). This switch in levels represents the first bit of the SYNC field. Hubs must limit the change in the width of the first bit of SOP when it is retransmitted to less than ± 5 ns. Distortion can be minimized by matching the nominal data delay through the hub with the output enable delay of the hub.
The SE0 state is used to signal an end-of-packet (EOP). EOP will be signaled by driving D+ and D- to the SE0 state for two bit times followed by driving the lines to the J state for one bit time. The transition from the SE0 to the J state defines the end of the packet at the receiver. The J state is asserted for one bit time and then both the D+ and D- output drivers are placed in their high-impedance state. The bus termination resistors hold the bus in the Idle state. "
So as stated above, your SE0 detection (if it was just using a single SE0) might be a little simplistic and potentially be falsely triggered on noise on the line where we record a single SE0 as a transient. If you weren't already doing this, I think we would need to see SE0 for two bit times followed by idle to really know the EOP condition has happened. I hope that is how you have detected the EOP. In software the COG will also be counting incoming bytes so we can ignore any overrun if we don't get the EOP within a reasonable length for the given packet type and endpoint configuration, so no need to worry about that. COG will know the maximum lengths possible to allow and the CRC's will save us too if two packets overrun.
5) As Cluso mentioned above somewhere, depending on how you do your clock recovery it might be nice to allow sync detection to still happen after a couple of KJ transitions have happened while you are still locking to the bit period. This could be the case after long bus suspend intervals where the clock might have drifted without arriving packets causing any transitions to lock to. But if you can reset you clock rate timeout counters upon the first transition like people do with start bits in async serial comms to a half period timeout you probably won't miss a sample and this could be moot.
What was the 5 byte counter used for in upper Z? Is that like a sequence counter to know if we are keeping up with the incoming data? Handy.
We could improve the programmable modes and put them into even pins, while odd pins can have USB modes.
That approach really does start to give the ability to extend the range of Smart pin modes and applications given a constrained gate count, especially as you can select a range of pins as the A/B inputs to a Smart pin's LUT in the programmable mode anyway (so a Smartpin doesn't have to use its "own" pin as such, it can use its neighbour's too). Yes it will involve more resource planning in the SW design. Good compromise.
Don't know about even/odd mixing of pins. I'd think you'd want D+ and D- on adjacent pins.
I think Chip plans that.
His comment suggests he thinks TX & RX can pack into one cell, and the pair nature means only every second cell NEEDS USB.
USB is half duplex, but I think one cell means no chance to pre-load TX for less turn around delay.
Comments
The 16-bit NCO affords good accuracy, though there is short-term jitter due to roll over differences.
I think yes.
HS USB does need a crystal.
Well, the old game, Rocky's Boots and Robot Odyssey both took a visual approach to this problem. You had a goal, sensors, logic units, etc... hook em up, and then let the robot loose to see what the circuit did.
All that is needed is 2d. Take the diagram Andy made and make it more detailed to see bits move and see state over time, and it gets translated to verilog... input bitstream, through circuit, and output bits are shown. It all happens in a little sandbox, with clock, and virtual test gear all ready to go and display.
Good for something like this. When the code is done, block working, it gets dropped into the larger project, and optimized at that time.
IMHO, doing that would make a fine thesis type research project for someone.
Re: NCO
We could add a tuning register to deliver more precision and or better control over error and when it happens or needs to be compensated for. Might not be a bad idea for the streamer NCO to improve video signals as well.
NCO above is used in the P1 sense of meaning an Adder-Based-Divider.
This gives finer divisions, but at the expense of jitter.
USB resyncs the adder on edges, and the same edge-reset can usefully be applied from Start-Bit-Edge for UARTS to allow Fractional Baud minus inter-byte jitter.
Your mention of Tuning Resistor seems to relate to Trim of the RCFAST, or Control of the PLL VCO ?
I've not seen predicted numbers on RCFAST drift and stability, or if one of the DACS can be MUX'd across to the VCO, but some user access to Trim can always be useful.
Joining a DAC to the VCO seems a low tech way to use what exists already ?
Most small MCUs with Trim these days, (even 30c ones) also start with a reasonable calibration of around 2% which is based on a carefully crafted Oscillator cell.
(plus a factory calibrate)
I'm not sure if Parallax expect to approach that ?
A higher price part like P2, I'd personally expect to see more often used at the other end of the precision scale - eg connected to a TCXO.
In that context it certainly should spec a Clipped Sine In, and I think the PLL needs small improving to give /M & /N VCO control.
If P2 plans Crystal Cap choices like P1, finer control there would allow better Crystal Oscillator calibrate.
I'm about to head out, and will post a link later when I return.
Yes, something graphical, but not simplistic. I love the idea of a graphical sandbox that just runs, and then when you're satisfied, out pops the Verilog. That could be a worthy endeavor.
Typing and syntax is efficient for most things, but defining concurrent hardware logic with state machines needs something more graphical. All graphical things I've seen, though, impose too many limitations. How to craft hardware like you would mold clay is hard to figure out, but it would sure be worth the trouble. Text is just so tedious in this case.
Text is fine for procedural languages, but too one-dimensional for lots of concurrent logic expressions. After all, procedural languages ARE one-dimensional because of their inherent sequentiality.
Hundreds of concurrent hardware expressions need to be broken up into their own little blobs, without concern for order. Having them in some order implies a need for order, which is misleading. Verilog ASIC files actually must be ordered to avoid use-before-declaration problems, which should not matter, but do due to the legacy tools.
Analog trim has a place, but Digital control is usually lower cost.
The NCO-as-baud also infers some preload phase value, as you want to both phase and frequency lock to the Trigger Edge. I think that may be ok as just Adder >> 1 ?
It takes your graphical labview code and programs FPGA with result.
Costs a fortune and takes forever to compile though...
IMHO, what Chip is talking about may be simpler and more easily used. I wish I were at a place to attempt it. For programming purposes, as described here, the simulation would not need to run all that fast.
Could really be worth doing.
Testing can often take longer than design, which is why I like the direction of
http://www.myhdl.org/
This is Python front end, that allows easy testing, and generates HDL.
However, USB is something of a special case, and likely needs testing done on real USB streams, and this should be possible in small steps, in sniffer mode on an active USB link.
Chip, is that so? If so, I understand if there isn't the space for it. But if there is, it would still be nice to have the simplified version that I laid out (or something like it). It would still allow some small level of flexibility for future protocol/signaling support. Frankly, I thought the state machines were one of the neatest features of the smart pins/cells, at least in concept.
We could improve the programmable modes and put them into even pins, while odd pins can have USB modes.
Is the COG-Pin link able to manage the half-duplex turnaround ok ?
Is it possible to have the input pin(s) routed via Smart Pin logic and then to back to the input into the INA so that a normal read of the pins using INA could yield the input pin(s) as modified by the smart pin logic without needing to actually read the smart pins???
Just a single mux & bit to select normal pin input, or from the smart pin logic, fed to the respective INA pin.
This could permit some interesting possibilities.
I don't understand your comment. Surely the input to the smart in logic is already there. In fact it can come from any input pin within +/- 3 pins.
All 64 inputs go to a ring that can be read by any cog using the INA type instruction (eg MOV xxx,INA).
What I am asking is it possible for those 64 rings to have a mux inserted between them that can select the normal/default actual input pin (think it is registered in the P2), or select the output from the Smart Pins ? ie each input pin could either be the actual pin, or via the smart pin logic (one smart pin per input/output pin).
I was under the impression that it was still possible, even when smart pins were activated, to write to the OUTx port/pin and read the INx port/pin. When wanting to talk to the smart pins, special read and write instructions were used which toggle the INx/OUTx pins at 2x the speed.
Hi Chip, sounds like you are going well despite being tasked by it.
Couple of points and possible optimizations.
1) USB2.0 spec section 8.1 "Bits are sent out onto the bus least-significant bit (LSb) first, followed by the next LSb, through to the most- significant bit (MSb) last. "
As USB sends LSB first any reassembled bit received is best shifted right within the 9 bit fields of Z. Otherwise a REV instruction will be required for every single arriving byte and that only adds overhead to the time critical COG loop. I guess this messes up things a bit for your Z as you currently have it shifting left. We may not need to keep prior bytes if that feature is tricky to retain.
2) I think you should flip the bit8 value meaning to have 0 mean normal data bytes and 1 meaning the exceptional cases with pure bus states. This potentially allows MOVS type of actions to be used on normal data byte cases (perhaps more useful for transmit) and the bit8 will already be cleared for you. Might save another instruction.
3) The bit destuffing happens after the NRZI decoding on receive, and bit stuffing before NRZI on transmit. So if stuffing is in HW it means we need NRZI in HW too (I'm not sure if it has been called out explicitly before). See 7.1.8. We don't send J's and K's we send 0,1 and they are converted to J/Ks by the NRZI process after the serial bit stuffing process, reverse on receive. You probably are aware of this, but just making sure.
4) From the spec again, section 7.1.7.4.1
"The start of a packet (SOP) is signaled by the originating port by driving the D+ and D- lines from the Idle state to the opposite logic level (K state). This switch in levels represents the first bit of the SYNC field. Hubs must limit the change in the width of the first bit of SOP when it is retransmitted to less than ± 5 ns. Distortion can be minimized by matching the nominal data delay through the hub with the output enable delay of the hub.
The SE0 state is used to signal an end-of-packet (EOP). EOP will be signaled by driving D+ and D- to the SE0 state for two bit times followed by driving the lines to the J state for one bit time. The transition from the SE0 to the J state defines the end of the packet at the receiver. The J state is asserted for one bit time and then both the D+ and D- output drivers are placed in their high-impedance state. The bus termination resistors hold the bus in the Idle state. "
So as stated above, your SE0 detection (if it was just using a single SE0) might be a little simplistic and potentially be falsely triggered on noise on the line where we record a single SE0 as a transient. If you weren't already doing this, I think we would need to see SE0 for two bit times followed by idle to really know the EOP condition has happened. I hope that is how you have detected the EOP. In software the COG will also be counting incoming bytes so we can ignore any overrun if we don't get the EOP within a reasonable length for the given packet type and endpoint configuration, so no need to worry about that. COG will know the maximum lengths possible to allow and the CRC's will save us too if two packets overrun.
5) As Cluso mentioned above somewhere, depending on how you do your clock recovery it might be nice to allow sync detection to still happen after a couple of KJ transitions have happened while you are still locking to the bit period. This could be the case after long bus suspend intervals where the clock might have drifted without arriving packets causing any transitions to lock to. But if you can reset you clock rate timeout counters upon the first transition like people do with start bits in async serial comms to a half period timeout you probably won't miss a sample and this could be moot.
What was the 5 byte counter used for in upper Z? Is that like a sequence counter to know if we are keeping up with the incoming data? Handy.
That approach really does start to give the ability to extend the range of Smart pin modes and applications given a constrained gate count, especially as you can select a range of pins as the A/B inputs to a Smart pin's LUT in the programmable mode anyway (so a Smartpin doesn't have to use its "own" pin as such, it can use its neighbour's too). Yes it will involve more resource planning in the SW design. Good compromise.
I remember the USB discussion many years ago.
You get a USB chip for under 4USD and you get smart pins as a bonus.
This is a never ending story. The P2 should be finished long time ago.
I’m not expecting a release of the P2 until the USB18.5 is released!
As they say – too many chefs make a mess.
USB support is huge and (I think) will help sell chips.
I think Chip plans that.
His comment suggests he thinks TX & RX can pack into one cell, and the pair nature means only every second cell NEEDS USB.
USB is half duplex, but I think one cell means no chance to pre-load TX for less turn around delay.