It appears that the glitch is caused by the pin being floated at the end of data packets.
Floating would be ok, provided it floated from HI.
That trailing pulse looks to be less than one bit time, (which in itself is a bad sign) and it looks to (incorrectly) drive low, then float almost at the same instant.
Strictly, the TX pin should not float until after the STOP bit is sent, and then, it should float from a Logic 1 (which means a slight pulldown would be needed to see the tristate timing)
Is the TxDone signal from the smart pin coming back too early ?
Thanks so much for thinking to test those things. Of course, floating TX can invite the unknown. Maybe we'll just keep it driven in full-duplex mode. For half-duplex mode, there'd need to be a user pull-up.
Engaging a crystal and the PLL is maybe too much for the serial loader, as the FPGA and real silicon have different circuitry. The serial loader needs to work from the common base.
I don't understand that comment. If the serial loader is told the clock frequency and PLL setup it should be able to set it up. Isn't that what the P1 loader does? It reads the clock frequency and PLL setting from the first 5 bytes of the Spin header.
Engaging a crystal and the PLL is maybe too much for the serial loader, as the FPGA and real silicon have different circuitry. The serial loader needs to work from the common base.
I don't understand that comment. If the serial loader is told the clock frequency and PLL setup it should be able to set it up. Isn't that what the P1 loader does? It reads the clock frequency and PLL setting from the first 5 bytes of the Spin header.
The real silicon uses a 24-bit configuration word. I could make a Prop_Clk command to configure it. It's just not very testable on the FPGA, so I'm reluctant. I think I'll do it now, though, with your prodding.
At 160MHz, we could go 8x as fast as we can at 20MHz. That would put a baud limit of 16MHz, instead of the current 2MHz. We could load all 512KB in 440ms using Prop_Txt (base64).
What if you simplified the bootloader by getting rid of the pin mask check and leaving out crystal support? People who want selective programming, faster programming, or other fancy programming features could just load their own second-stage bootloader that tests pin masks, sets up the crystal, etc. and then receives the program (if it decides that the program is in fact intended for it and not another P2 on the bus).
What if you simplified the bootloader by getting rid of the pin mask check and leaving out crystal support? People who want selective programming, faster programming, or other fancy programming features could just load their own second-stage bootloader that tests pin masks, sets up the crystal, etc. and then receives the program (if it decides that the program is in fact intended for it and not another P2 on the bus).
I kind of like that. Anyone have any thoughts? Is the pin-mask thing overkill for the ROM? I think 2M baud is pretty decent, anyway.
I would like both features, the pin-mask and the clock-setup through the serial loader.
If I have to choose one I am for the clock-setting through serial loader. Because I feel that it would be much more cases where only 1 P2 will be present and of course we want it boot faster with large code. If I look at my needs I have only one application for double-P2 (used in HA configuration, thus using single image) all the others are single-P2.
But with my proposal, you could still do pin masking and crystal set up. It'd just be done by the second stage bootloader instead of the first stage bootloader. You'd get both capabilities, and at the same time a simpler ROM.
You could still load large programs quickly, by using first stage bootloader running off of the RC clock to load a small second stage bootloader that switches to the crystal and downloads the whole program at a high speed. You could still do the pin mask test: every P2 would accept the same second stage bootloader, but then the second stage bootloader would do the pin mask test and only accept programs intended for it.
EDIT: The second stage bootloader could be tiny, by reusing the first stage bootloader's code. All that'd be required to switch to the crystal and then download a program would be:
clkset {whatever}
{fix smartpin baud}
jmp #first_stage_bootloader ' run first stage bootloader again, but with fast crystal this time
I've been looking into the weird Pnut loading issue which appeared on the BeMicro-A9 V24 image.
The issue seems to be caused by a combination of a routing difference with Quartus and the serial comms routines in ROM.
My initial focus was on the reset line but nothing really stood out as strange there.
I was able to capture pin activity on reset and Rx(#61) Ok but as soon as I try to probe the Tx(#50) pin the issue would dissaspear.
With a probe connected to Tx and a successful load from Pnut I detected a glitch and slow rise at the end of each data packer.
This glitch can be seen at the end of the "Prop_Ver" strings last stop bit.
The same glitch can be seen when the status ". or !" byte is transmitted too.
It appears that the glitch is caused by the pin being floated at the end of data packets.
This seems to explain why the identify hardware works as Pnut is looking for a "Prop_Ver" string so the
last character "lf" is ignored anyway.
Because the status response is a single byte "." or "!" and it's being corrupted the "lost" error is triggered.
I assume this might be linked to the half duplex serial mode?
This also seems to explain why my own loader and dave's loader work Ok.
At 2M baud, a whole stop bit gets output, due to the time it takes to shut the transmitter off, but at 500K baud, only a fraction of a start bit is output.
TX would need a pull-up resistor, at least. We might even need to drive the pin high long enough to charge up any parasitic capacitance, before letting it float. Imagine if we had a 160MHz clock; the problem would be even more pronounced. This brings up some difficult issues with half-duplex mode, especially. The host would need to observe a short delay before turning the comm pin around, allowing the Prop2 some time to get the pin reliably high.
The easy way around all this is to only support full-duplex mode and always drive TX. Maybe we should do that. It's very straightforward then. What do you guys think? This needs to be as rock-solid as possible.
By adopting a full-duplex-only protocol, a user could download a short program to implement half-duplex mode, if he wanted it. He wouldn't need to observe TX from the Prop2, beforehand.
I realize that as soon as we start a serial dialogue, we MUST start in full-duplex mode, to at least drive TX high, in case it is being observed by the host. If we go into half-duplex soon after, we could cease driving TX, but TX would have already been driven high briefly, putting a certain limitation on secondary use for that pin.
I see now that half-duplex is not that practical to support in our ROM loader, given that we must initially support full-duplex, which requires driving TX high.
I always disliked Parallax obsession with one wire transmission, half duplex.
sure the three wire servo cable are nice, but I am sure one can buy 4-wire cables also. At the beginning it was quite confusing to me, later on I got used to it, but I never really liked it.
@Peter Jakacki told me that most of bidirectional fullduplex transfer are in reality not really fullduplex, since mostly we have a request/response schema and he is right as usual.
So I do understand the reasoning behind 1-wire halfduplex, but to me it is still cumbersome.
But - hmm - I sold one single unit of all my projects in the last 8(?) years, so I am the most non-volume customer parallax has.
This glitch can be seen at the end of the "Prop_Ver" strings last stop bit.
The same glitch can be seen when the status ". or !" byte is transmitted too.
It appears that the glitch is caused by the pin being floated at the end of data packets.
At 2M baud, a whole stop bit gets output, due to the time it takes to shut the transmitter off, but at 500K baud, only a fraction of a start bit is output.
TX would need a pull-up resistor, at least. We might even need to drive the pin high long enough to charge up any parasitic capacitance, before letting it float. Imagine if we had a 160MHz clock; the problem would be even more pronounced. This brings up some difficult issues with half-duplex mode, especially. The host would need to observe a short delay before turning the comm pin around, allowing the Prop2 some time to get the pin reliably high.
It seems worse than that, as the pin looks to drive LOW right at the time the tristate kicks in - which is why even adding a scope probe 'fixes it'
Most well designed UARTS should generate a TX_DONE signal, which is at the end of the TX_STOP, to avoid any such partial truncate effects.
That way, baud settings + SW interactions cannot affect bit lengths.
The easy way around all this is to only support full-duplex mode and always drive TX. Maybe we should do that. It's very straightforward then. What do you guys think? This needs to be as rock-solid as possible.
It's ok to drive full duplex, and have tri-state only on the single-pin mode.
The appeal of tristate in both modes, is it allows easier testing of single pin modes.
We might even need to drive the pin high long enough to charge up any parasitic capacitance, before letting it float.
That should happen naturally, as the chip always sends a STOP bit. If it sends STOP, then floats, is has driven high before float.
Looks to me like the float is not occurring after the stop bit ?
I kind of like that. Anyone have any thoughts? Is the pin-mask thing overkill for the ROM?
I think the pin-mask is useful to have, and if that is not in ROM, how do you load many P2's in the first place.
The pin mask could easily be optional, with a second simpler form that has no pin mask.
Not to someone who wants fastest possible boot time, in the smallest footprint.
Key features I see of PLL setting are
* it is automatically optional, most small-code users would not use this, but it has to be useful for chip testing
* It allows a loader-MCU to easily perform the Xtal-fail feature many systems demand, that P2 lacks.
Keeping firmware code small in the loader-MCU, leaves more room for user code!
* It allows faster loading, with minimal admin - no stub-loaders to include and version manage....
To me, stub loaders should be for the very special cases, not just going faster on the serial link.
I realize that as soon as we start a serial dialogue, we MUST start in full-duplex mode, to at least drive TX high, in case it is being observed by the host..
Most UART bridges & MCUs I know, have pullups on the TX and RX pins. These are more modest, (as you can see in the scope shots above), but they are there.
As long as TX floats, and never spikes low, the UART Bridge will read a high, just fine. The errors above are more because the pin drives low, than because it floats. ie Float-from-hi is ok.
Someone can always add a pullup, but it really should not be needed, if the timing and drives are valid.
The easy way around all this is to only support full-duplex mode and always drive TX. Maybe we should do that. It's very straightforward then. What do you guys think? This needs to be as rock-solid as possible.
This glitch can be seen at the end of the "Prop_Ver" strings last stop bit.
The same glitch can be seen when the status ". or !" byte is transmitted too.
It appears that the glitch is caused by the pin being floated at the end of data packets.
At 2M baud, a whole stop bit gets output, due to the time it takes to shut the transmitter off, but at 500K baud, only a fraction of a start bit is output.
TX would need a pull-up resistor, at least. We might even need to drive the pin high long enough to charge up any parasitic capacitance, before letting it float. Imagine if we had a 160MHz clock; the problem would be even more pronounced. This brings up some difficult issues with half-duplex mode, especially. The host would need to observe a short delay before turning the comm pin around, allowing the Prop2 some time to get the pin reliably high.
It seems worse than that, as the pin looks to drive LOW right at the time the tristate kicks in - which is why even adding a scope probe 'fixes it'
Most well designed UARTS should generate a TX_DONE signal, which is at the end of the TX_STOP, to avoid any such partial truncate effects.
That way, baud settings + SW interactions cannot affect bit lengths.
The easy way around all this is to only support full-duplex mode and always drive TX. Maybe we should do that. It's very straightforward then. What do you guys think? This needs to be as rock-solid as possible.
It's ok to drive full duplex, and have tri-state only on the single-pin mode.
The appeal of tristate in both modes, is it allows easier testing of single pin modes.
We might even need to drive the pin high long enough to charge up any parasitic capacitance, before letting it float.
That should happen naturally, as the chip always sends a STOP bit. If it sends STOP, then floats, is has driven high before float.
Looks to me like the float is not occurring after the stop bit ?
I did my own tests and saw that the pin was, indeed, being floated, not driven low, after the start of the stop bit. There is, however, a negative impulse, though there's not a clock-cycle worth of low time. It must be due to switching logic sources for driving the pin's output enable and output state. The output enable goes low on the same clock the output state goes low, but both signals are the product of different post-flop logic. So, there's maybe a nanosecond of the pin output state going low before the pin output enable goes low, causing a downward impulse. The pin pad circuitry is going to contribute more timing vagaries to this issue.
This is not an issue when you are controlling pin states directly via OUTx/DIRx, but when smart pins are involved, you may have a situation where the effective OUT and DIR states are changing on the same clock, but at slightly different delays within that clock. I think this can be gotten around, though, in the case of serial transmit, by doing a 'WRPIN #%00_11110_0,#tx_pin' to make the pin's DIR low, while the smart pin mode is still driving the output state high. Then, you kill the smart pin mode by clearing its DIR bit. That would put the effective DIR and OUT activity on different clocks, so that they weren't both changing at the same time, thereby getting rid of the glitch.
I did my own tests and saw that the pin was, indeed, being floated, not driven low, after the start of the stop bit. There is, however, a negative impulse, though there's not a clock-cycle worth of low time. It must be due to switching logic sources for driving the pin's output enable and output state.
OK, so a logic race condition, with overlap in this case... explains why a probe-connect changed things...
The output enable goes low on the same clock the output state goes low, but both signals are the product of different post-flop logic..
I'm puzzled .... why does the output state go low at that time ?
The smart pin has started sending a stop bit in your description above ?
Ideally. in a uart, you signal done at the end of the stop bit (not start of stop bit) and then it remains hi, until a new Tx byte arrives.
With no change in output state, it should be safe to disable pin drive, and get no glitches.
I did my own tests and saw that the pin was, indeed, being floated, not driven low, after the start of the stop bit. There is, however, a negative impulse, though there's not a clock-cycle worth of low time. It must be due to switching logic sources for driving the pin's output enable and output state.
OK, so a logic race condition, with overlap in this case... explains why a probe-connect changed things...
The output enable goes low on the same clock the output state goes low, but both signals are the product of different post-flop logic..
I'm puzzled .... why does the output state go low at that time ?
The smart pin has started sending a stop bit in your description above ?
Ideally. in a uart, you signal done at the end of the stop bit (not start of stop bit) and then it remains hi, until a new Tx byte arrives.
With no change in output state, it should be safe to disable pin drive, and get no glitches.
Yes, the smart pin has started holding the pin high for the stop bit, but is done transmitting. Upon receiving a new word to send, it outputs a whole stop bit before the start bit. It works this way so that you can reconfigure as a serial receiver before the stop bit period would be over, so you'd be ahead of a possible new start bit. In my code, I was waiting for that 'done' signal, indicating that we had begun the stop-bit state.
I edited my prior post to explain why the glitch was occurring. It has to do with smart pins changing their DIR and OUT states simultaneously. Things can be done, though, to get around the simultaneous changes and avoid this kind of glitch.
I edited my prior post to explain why the glitch was occurring. It has to do with smart pins changing their DIR and OUT states simultaneously. Things can be done, though, to get around the simultaneous changes and avoid this kind of glitch.
I still do not see why the OUT pin needs to change state there ? - it should be still sending the STOP bit, by your description.
Yes, the smart pin has started holding the pin high for the stop bit, but is done. Upon a new word to send, it outputs a whole stop bit before the start bit. It works this way so that you can reconfigure as a serial receiver before the stop bit period would be over, so you'd be ahead of a possible new start bit. In my code, I was waiting for that 'done' signal, indicating that we had begun the stop-bit state.
Reacting faster like that sounds like a good idea, but it can cause problems in system design, because you have not actually completed the stop bit.
That means some baud-related delay is needed to patch the code, for slower baud speeds.
Does the new word to send wait for the next Tx Clock edge, so that Tx data is always Tx-clock paced, or does it time from the Smart-Pin load, meaning you can get fractional stop bits (+ the whole stop bit you are about to send) ?
When the tx smart pin is 'idle/stop' or you set the DIR bit low to reset the tx smart pin, it holds the pin's DIR and OUT inputs both high. When you do a 'WRPIN #0,#tx_pin' to disable the smart pin, with your DIR bit low, The smart pin ceases driving the pin's DIR and OUT signals and the cog DIR and OUT bits take over, causing a negative transition on both the pin's DIR and OUT. This is where the glitch occurs, as the OUT apparently drops earlier than the DIR.
Reacting faster like that sounds like a good idea, but it can cause problems in system design, because you have not actually completed the stop bit.
That means some baud-related delay is needed to patch the code, for slower baud speeds.
Yes, this is an issue. If we made it the other way, though, that would cause the start bit to begin without delay, possibly without a preceding high period. That would be bad, too.
What we need is a way to cause the async tx circuit to stay idle for a clock period, by giving it some special command, as needed.
Does the new word to send wait for the next Tx Clock edge, so that Tx data is always Tx-clock paced, or does it time from the Smart-Pin load, meaning you can get fractional stop bits (+ the whole stop bit you are about to send) ?
If the transmit buffer has a new word ready, the transition into the timed stop bit, followed by the start bit, is immediate. Otherwise, there will be some extra time, until a new transmit word is written.
When the tx smart pin is 'idle/stop' or you set the DIR bit low to reset the tx smart pin, it holds the pin's DIR and OUT inputs both high. When you do a 'WRPIN #0,#tx_pin' to disable the smart pin, with your DIR bit low, The smart pin ceases driving the pin's DIR and OUT signals and the cog DIR and OUT bits take over, causing a negative transition on both the pin's DIR and OUT. This is where the glitch occurs, as the OUT apparently drops earlier than the DIR.
So you could set the Boolean/COG Pin OUT to 1, to avoid this hand-over effect ?
When the tx smart pin is 'idle/stop' or you set the DIR bit low to reset the tx smart pin, it holds the pin's DIR and OUT inputs both high. When you do a 'WRPIN #0,#tx_pin' to disable the smart pin, with your DIR bit low, The smart pin ceases driving the pin's DIR and OUT signals and the cog DIR and OUT bits take over, causing a negative transition on both the pin's DIR and OUT. This is where the glitch occurs, as the OUT apparently drops earlier than the DIR.
So you could set the Boolean/COG Pin OUT to 1, to avoid this hand-over effect ?
What we need is a way to cause the async tx circuit to stay idle for a clock period, by giving it some special command, as needed.
I've seeen UARTS use two Tx flags to cover this.
One is Tx_Ready, and means you can load anther byte into the Tx Queue, (and do so earlier in the physical byte-out time), and the other is Tx_Done, which (ideally) shows that the stop bit is completed, and a pin can be tristated.
Some UARTS allow a 3rd pin to connect to this HW signal, for hardware RS485 handling, at high baud rates.
With P2 having a leading stop bit, that end-of-stop becomes harder to nail down and signal.
Maybe I can make it do a timed stop bit if there is no new word buffered to transmit when the current word is done. Then, it would signal 'done' after the full stop bit.
Maybe I can make it do a timed stop bit if there is no new word buffered to transmit when the current word is done. Then, it would signal 'done' after the full stop bit.
Interesting idea. That would make RS485 code much more baud-tolerant.
Out of curiosity, what if we were to get rid of the UART boot altogether? Would there be a way to just use full duplex SPI? For instance, suppose we had the following pins assigned: CLK (63), MISO(62), MOSI(61), /CS(60). With that in mind, the boot sequence(s) might go something like:
Programming:
1. P2 tests CLK to detect that it is pulled low. This indicates P2 is initially in SPI slave mode.
2. Receive command to load program (over SPI) into hub ram.
3. Receive command to switch to SPI master mode and perform the programming.
4. P2 switches to SPI master mode, pulls /CS low, and starts programming flash.
5. The external programmer would then monitor /CS to see when P2 took it high again to indicate that the operation was done and had returned to slave mode.
Loading without programming:
1. P2 tests CLK to detect that it is pulled low. This indicates P2 is initially in SPI slave mode.
2. Receive command to load program (over SPI) into hub ram.
3. Receive command to trigger appropriate COGINIT.
Booting from flash:
1. P2 tests CLK to detect that it is pulled high. This indicates that the external programmer is not connected.
2. P2 pulls /CS low and reads the flash into hub ram.
3. P2 triggers the appropriate COGINIT.
There are certainly some details that would need to be hashed out, but I think you get the idea...
Out of curiosity, what if we were to get rid of the UART boot altogether? Would there be a way to just use full duplex SPI? ...
Anything is possible, but one target of Easy Booting is Students and Developers Download, and USB-UART bridges are far more common that USB-SPI bridges.
It's also very common to need a printf style debug, and there, a UART is the defacto connection.
The ROM code is really not tight, (especially now SHA security is gone) so why not be flexible ?
There is more argument for including i2c into Boot, than removing UART.
i2c uses fewer pins, and opens up more choices.
Price wise, i2c struggles - even a modest 8kB part, is ~ 13c/4k, whilst in SPI, a flash device at ~15c, gives you 512kB !
- however, there may be applications where those fewer pins of i2c, and/or more flexible slave device, really matter to a user.
Comments
Floating would be ok, provided it floated from HI.
That trailing pulse looks to be less than one bit time, (which in itself is a bad sign) and it looks to (incorrectly) drive low, then float almost at the same instant.
Strictly, the TX pin should not float until after the STOP bit is sent, and then, it should float from a Logic 1 (which means a slight pulldown would be needed to see the tristate timing)
Is the TxDone signal from the smart pin coming back too early ?
How wide is that trailing pulse, in baud-times ?
I'll be up in the morning and I'll get on this.
Thanks so much for thinking to test those things. Of course, floating TX can invite the unknown. Maybe we'll just keep it driven in full-duplex mode. For half-duplex mode, there'd need to be a user pull-up.
The real silicon uses a 24-bit configuration word. I could make a Prop_Clk command to configure it. It's just not very testable on the FPGA, so I'm reluctant. I think I'll do it now, though, with your prodding.
At 160MHz, we could go 8x as fast as we can at 20MHz. That would put a baud limit of 16MHz, instead of the current 2MHz. We could load all 512KB in 440ms using Prop_Txt (base64).
I kind of like that. Anyone have any thoughts? Is the pin-mask thing overkill for the ROM? I think 2M baud is pretty decent, anyway.
On Crystal support I am on Electrodudes side, There is already support for a second stage loader in software.
Mike
If I have to choose one I am for the clock-setting through serial loader. Because I feel that it would be much more cases where only 1 P2 will be present and of course we want it boot faster with large code. If I look at my needs I have only one application for double-P2 (used in HA configuration, thus using single image) all the others are single-P2.
You could still load large programs quickly, by using first stage bootloader running off of the RC clock to load a small second stage bootloader that switches to the crystal and downloads the whole program at a high speed. You could still do the pin mask test: every P2 would accept the same second stage bootloader, but then the second stage bootloader would do the pin mask test and only accept programs intended for it.
EDIT: The second stage bootloader could be tiny, by reusing the first stage bootloader's code. All that'd be required to switch to the crystal and then download a program would be:
At 2M baud, a whole stop bit gets output, due to the time it takes to shut the transmitter off, but at 500K baud, only a fraction of a start bit is output.
TX would need a pull-up resistor, at least. We might even need to drive the pin high long enough to charge up any parasitic capacitance, before letting it float. Imagine if we had a 160MHz clock; the problem would be even more pronounced. This brings up some difficult issues with half-duplex mode, especially. The host would need to observe a short delay before turning the comm pin around, allowing the Prop2 some time to get the pin reliably high.
The easy way around all this is to only support full-duplex mode and always drive TX. Maybe we should do that. It's very straightforward then. What do you guys think? This needs to be as rock-solid as possible.
I realize that as soon as we start a serial dialogue, we MUST start in full-duplex mode, to at least drive TX high, in case it is being observed by the host. If we go into half-duplex soon after, we could cease driving TX, but TX would have already been driven high briefly, putting a certain limitation on secondary use for that pin.
I see now that half-duplex is not that practical to support in our ROM loader, given that we must initially support full-duplex, which requires driving TX high.
sure the three wire servo cable are nice, but I am sure one can buy 4-wire cables also. At the beginning it was quite confusing to me, later on I got used to it, but I never really liked it.
@Peter Jakacki told me that most of bidirectional fullduplex transfer are in reality not really fullduplex, since mostly we have a request/response schema and he is right as usual.
So I do understand the reasoning behind 1-wire halfduplex, but to me it is still cumbersome.
But - hmm - I sold one single unit of all my projects in the last 8(?) years, so I am the most non-volume customer parallax has.
Mike
Most well designed UARTS should generate a TX_DONE signal, which is at the end of the TX_STOP, to avoid any such partial truncate effects.
That way, baud settings + SW interactions cannot affect bit lengths.
It's ok to drive full duplex, and have tri-state only on the single-pin mode.
The appeal of tristate in both modes, is it allows easier testing of single pin modes.
That should happen naturally, as the chip always sends a STOP bit. If it sends STOP, then floats, is has driven high before float.
Looks to me like the float is not occurring after the stop bit ?
I think the pin-mask is useful to have, and if that is not in ROM, how do you load many P2's in the first place.
The pin mask could easily be optional, with a second simpler form that has no pin mask.
Not to someone who wants fastest possible boot time, in the smallest footprint.
Key features I see of PLL setting are
* it is automatically optional, most small-code users would not use this, but it has to be useful for chip testing
* It allows a loader-MCU to easily perform the Xtal-fail feature many systems demand, that P2 lacks.
Keeping firmware code small in the loader-MCU, leaves more room for user code!
* It allows faster loading, with minimal admin - no stub-loaders to include and version manage....
To me, stub loaders should be for the very special cases, not just going faster on the serial link.
As long as TX floats, and never spikes low, the UART Bridge will read a high, just fine. The errors above are more because the pin drives low, than because it floats. ie Float-from-hi is ok.
Someone can always add a pullup, but it really should not be needed, if the timing and drives are valid.
I did my own tests and saw that the pin was, indeed, being floated, not driven low, after the start of the stop bit. There is, however, a negative impulse, though there's not a clock-cycle worth of low time. It must be due to switching logic sources for driving the pin's output enable and output state. The output enable goes low on the same clock the output state goes low, but both signals are the product of different post-flop logic. So, there's maybe a nanosecond of the pin output state going low before the pin output enable goes low, causing a downward impulse. The pin pad circuitry is going to contribute more timing vagaries to this issue.
This is not an issue when you are controlling pin states directly via OUTx/DIRx, but when smart pins are involved, you may have a situation where the effective OUT and DIR states are changing on the same clock, but at slightly different delays within that clock. I think this can be gotten around, though, in the case of serial transmit, by doing a 'WRPIN #%00_11110_0,#tx_pin' to make the pin's DIR low, while the smart pin mode is still driving the output state high. Then, you kill the smart pin mode by clearing its DIR bit. That would put the effective DIR and OUT activity on different clocks, so that they weren't both changing at the same time, thereby getting rid of the glitch.
OK, so a logic race condition, with overlap in this case... explains why a probe-connect changed things...
I'm puzzled .... why does the output state go low at that time ?
The smart pin has started sending a stop bit in your description above ?
Ideally. in a uart, you signal done at the end of the stop bit (not start of stop bit) and then it remains hi, until a new Tx byte arrives.
With no change in output state, it should be safe to disable pin drive, and get no glitches.
Yes, the smart pin has started holding the pin high for the stop bit, but is done transmitting. Upon receiving a new word to send, it outputs a whole stop bit before the start bit. It works this way so that you can reconfigure as a serial receiver before the stop bit period would be over, so you'd be ahead of a possible new start bit. In my code, I was waiting for that 'done' signal, indicating that we had begun the stop-bit state.
I edited my prior post to explain why the glitch was occurring. It has to do with smart pins changing their DIR and OUT states simultaneously. Things can be done, though, to get around the simultaneous changes and avoid this kind of glitch.
I still do not see why the OUT pin needs to change state there ? - it should be still sending the STOP bit, by your description.
Reacting faster like that sounds like a good idea, but it can cause problems in system design, because you have not actually completed the stop bit.
That means some baud-related delay is needed to patch the code, for slower baud speeds.
Does the new word to send wait for the next Tx Clock edge, so that Tx data is always Tx-clock paced, or does it time from the Smart-Pin load, meaning you can get fractional stop bits (+ the whole stop bit you are about to send) ?
Yes, this is an issue. If we made it the other way, though, that would cause the start bit to begin without delay, possibly without a preceding high period. That would be bad, too.
What we need is a way to cause the async tx circuit to stay idle for a clock period, by giving it some special command, as needed.
If the transmit buffer has a new word ready, the transition into the timed stop bit, followed by the start bit, is immediate. Otherwise, there will be some extra time, until a new transmit word is written.
So you could set the Boolean/COG Pin OUT to 1, to avoid this hand-over effect ?
I suppose so, if DIR was also high.
One is Tx_Ready, and means you can load anther byte into the Tx Queue, (and do so earlier in the physical byte-out time), and the other is Tx_Done, which (ideally) shows that the stop bit is completed, and a pin can be tristated.
Some UARTS allow a 3rd pin to connect to this HW signal, for hardware RS485 handling, at high baud rates.
With P2 having a leading stop bit, that end-of-stop becomes harder to nail down and signal.
Interesting idea. That would make RS485 code much more baud-tolerant.
That should solve this issue ? - Should be easy enough to try & check ?
Programming:
1. P2 tests CLK to detect that it is pulled low. This indicates P2 is initially in SPI slave mode.
2. Receive command to load program (over SPI) into hub ram.
3. Receive command to switch to SPI master mode and perform the programming.
4. P2 switches to SPI master mode, pulls /CS low, and starts programming flash.
5. The external programmer would then monitor /CS to see when P2 took it high again to indicate that the operation was done and had returned to slave mode.
Loading without programming:
1. P2 tests CLK to detect that it is pulled low. This indicates P2 is initially in SPI slave mode.
2. Receive command to load program (over SPI) into hub ram.
3. Receive command to trigger appropriate COGINIT.
Booting from flash:
1. P2 tests CLK to detect that it is pulled high. This indicates that the external programmer is not connected.
2. P2 pulls /CS low and reads the flash into hub ram.
3. P2 triggers the appropriate COGINIT.
There are certainly some details that would need to be hashed out, but I think you get the idea...
Anything is possible, but one target of Easy Booting is Students and Developers Download, and USB-UART bridges are far more common that USB-SPI bridges.
It's also very common to need a printf style debug, and there, a UART is the defacto connection.
The ROM code is really not tight, (especially now SHA security is gone) so why not be flexible ?
There is more argument for including i2c into Boot, than removing UART.
i2c uses fewer pins, and opens up more choices.
Price wise, i2c struggles - even a modest 8kB part, is ~ 13c/4k, whilst in SPI, a flash device at ~15c, gives you 512kB !
- however, there may be applications where those fewer pins of i2c, and/or more flexible slave device, really matter to a user.