All combinations of A rise/fall, B hi/lo and B rise/fall, A hi/lo * would be helpful, just like a "stateful" scope trigger. This would accommodate quadrature decoding (which I know is already covered elsewhere), as well as Start/Stop detection for I2C.
* or, equivalently, all combinations of A, A(prev), B, and B(prev) , with an additional command long telling what to do (increment, decrement, nothing, error) for each.
-Phil
I have something like that now, where for each counter mode, there's a fixed pattern of 16 two-bit values which are used as +0/+1/-1. There are sixteen to accommodate all cases of A, A(prev), B, B (prev). The %10 case is unused, but could be an event trigger.
I've thought about making this 16x2-bit pattern user-settable, but it would require another 32 bits of flops (out of a current ~200) per smart pin. My thinking has been to keep such details away from the user, so he's not burdened with thinking about these patterns. It seems easier to just have modes for A-B encoders, counters, frequency measuring, etc. Otherwise, they'd have to load some cryptic long into the smart pin to get it to do something. I'm not against that, personally, but I don't want to overburden people with details. We could have labels for these values (ie 'pin_encoder' = %00_01_11_00__11_00_00_01__01_00_00_11__00_11_01_00).
What do you guys think?
This comes down to coverage.
Do you have a table of the modes supported by your current code ?
- ie what is missing, from the simpler choices ?
FlipFlops are not of as high as cost in ASIC, as they are in FPGAs, and they replace mini-roms.
Having pre-defines as words for modes, is not hard to document, so users rarely need to know the 10101's
I got the smart pin pretty much done, but I've decided to combine the measurement and timer logic blocks. I am adding some modes where you can inc/dec/clear on any combination of A, A(prev), B, B(prev) and selectively report results back or just alert via the IN bit.
The A and B inputs to smart pins can now be the local pin, any of three pins above or below, or the OUT signal to either the local or adjacent pin. This will allow a smart pin to use another smart pin's output as one of its inputs, or allow you to control it via the OUT bit. Just working out the details now.
These things are just getting smarter and smarter! I'm afraid that they're getting too smart for me to understand!
It's nothing that a little diagram won't clear up.
I was thinking, after I worked all night, that inspiration is akin to a true random number generator, where you must wait for some phenomenon to generate the number. You can't just get it on demand. Most of the night, I just kind of sat there waiting for things to congeal in my head, not knowing when, or if, it would come. Finally, almost on its own, the mental picture emerged and it was quite clear. I had been trying to think about this for a week, to no avail.
Sometimes just taking a break is the best thing to do. After beating my head on a problem for hours I would go for a coffee break, and quite often the solution or an idea that lead to it would come to me while I was relaxing. This was particularly true for the university cafeterias. Must have been the ambiance there.
We've got a framework now that can do all kinds of measurements, including things like A-B encoding. I had these measurements working, already, but not as a general case where the user could set up any configuration. This is going to be very simple to code, but these darn ideas are hard to get.
Love the way the smart pins are coming together. So much flexibility and power yet they sound like they will be so easy to use. Really looking forward to getting my hands on the final product.
I am adding some modes where you can inc/dec/clear on any combination of A, A(prev), B, B(prev) and selectively report results back or just alert via the IN bit.
I like the fact that clear is in there! But did you mean to exclude do nothing?
I am adding some modes where you can inc/dec/clear on any combination of A, A(prev), B, B(prev) and selectively report results back or just alert via the IN bit.
I like the fact that clear is in there! But did you mean to exclude do nothing?
-Phil
Yes, 'do nothing' was one of the four combos.
You are probably wondering exactly how it works and what it can do.
First, a quick diagram of the smart pin.
Second, some waveforms and how it came to be. The waveform was actually gapped before it became flat.
But you are right, a little diagram does make it clearer.
I admit to a having a high fever, but I swear that smart pin looks like a retro-virus and the output looks like an alley during Oktoberfest.
I am adding some modes where you can inc/dec/clear on any combination of A, A(prev), B, B(prev) and selectively report results back or just alert via the IN bit.
The A and B inputs to smart pins can now be the local pin, any of three pins above or below, or the OUT signal to either the local or adjacent pin. This will allow a smart pin to use another smart pin's output as one of its inputs, or allow you to control it via the OUT bit. Just working out the details now.
Sounds flexible.
There are some uses cases where you might want to have a 'hold-off', on Time interval A-B, or average Time interval A-B, and there a 3rd in Cell could get into the mix.
Others, where you may want to capture some user-defined number of cycles.
There, one Pin cell would count the N and another adjacent cell capture time on /N TC
Regarding the A and B signals query one of the possible surces can be "cog output" and "output pin" so that the input (smartblock output) of the same pin can trigger for differences. In this case the delay between A nad B that someone suggested before can be used to delay the event. This (the resulted input) can became an interrupt source now.
With all the discussions going on I have not understood if the smart pins it's only a digital matter or have to do also with analog part. In this case I would like know if the following function (picture) can be done with the foreseen features. I mean having a comparator with swappable inverting and non inverting inputs (or selectable negated output) that can be ORed with the COG's output. For example to quickly shutdown a pwm output in case the comparator preset level is exceded. If the comparator could have also an optional selectable hold memory on output still better.
The aim is to use external components as les as possible, run the code as slow as possible for energy saving reasons and still be able to react quickly in case of excedance of limiting values.
For example to quickly shutdown a pwm output in case the comparator preset level is exceded.
I think Chip has mentioned Reset of PWM already, from an adjacent pin,
My guess is that Reset signal can be either direct, or via Analog Comparator, but I've not seen Tpd of Vos numbers on Analog parts of the design yet.
There's an SMPS mode for PWM that can shorten the pulse, based on a shunt resistor being input to a comparator whose other input connects to an 8-bit DAC inside the I/O pin.
Okay. I've been off the forum for a few days, working on the programmable mode, and I think I've got the whole smart pin done! I thought I was here a few times before, but I think this is it. I'll do a complete test on Monday to make sure I haven't broken anything and then I'll do the FPGA image compiles.
Making the programmable mode has been tough. It can do a lot of things, but it's going to take some usage before we know if it does what we need it to do. You can build all kinds of things out of it, like strange pulse measurers, serializers/deserializers, event trappers, and I don't know what else. It's not that complicated, but it's a little like chess, where you have different pieces that can move differently, and you could go insane trying to figure out all possibilities. This is going to require some documentation with diagrams.
To briefly describe the programmable mode, it works like this:
It inputs two pins: {B-previous, B-current, A-previous, A-current} to get a 4-bit input value on each clock. The A-current bit can be selected from a few other sources, as well.
It maintains a 32-bit variable which can be set to 0/1/-1, incremented or decremented (with or without saturation), right- or left-shifted with A-previous or 0, set to any value by a cog, or left the same, on each clock, according to input and state.
It uses dual 16-bit fields to pick 0's and 1's, based on the inputs and state. In 2-state mode, the 16-bit fields are selectively toggled between to get the 0/1 values. In 1-state mode, the fields are ganged together to get 0/1/2/3, to make four possible actions per clock, instead of just two.
Ultimately, a 5-bit opcode executes on each clock:
xx000 = <same>
xx001 = set acc = 0
xx010 = set acc = 1
xx011 = set acc = -1
xx100 = increment acc
xx101 = decrement acc
xx110 = A-previous/0 >> acc >> OUT
xx111 = OUT << acc << A-previous/0
x1xxx = next, 2-bit: toggle bit 0 xor for ops xx010..xx111 starting this op
x1xxx = next, 1-bit: toggle state, switches bitfields
1xxxx = signal: raises IN bit and outputs the 32-bit variable for reading by cog
You pick four 5-bit opcodes and the dual 16-bit fields, along with seven configuration bits and a 5-bit counter value which can be used to selectively delay toggling into the other state (by x1xxx occurrences).
As mentioned, A-current can be substituted with other signals: MSB or LSB of the 32-bit variable or the state bit (useful for 1-state mode, where fields are always combined, not toggled between).
You can pick either the MSB or LSB of the 32-bit variable to drive the OUT signal, if the cogs' OUT is not used. For each A and B input, you can pick the local pin's input, the inputs from relative pins +3/+2/+1/-1/-2/-3, the OUT signal coming from the cogs, or the adjacent pin's final OUT signal. Before A goes into the smart pin flops, you can select A^B, A|B, A&B, or just A. There are inversion selectors for all inputs, as well as output.
Here is a list of all pin modes:
instructions
--------------------------------------------------------------------------------------
PINMODE D/#,S/# 'write D[31:0] to pin S[5:0] mode
PINSETX D/#,S/# 'write D[31:0] to pin S[5:0] X
PINSETY D/#,S/# 'write D[31:0] to pin S[5:0] Y
PINREAD D,S/# 'read data from pin s[5:0], mode dependent
PINACK S/# 'acknowledge pin to drop IN signal (alias for PINMODE #0,S/#)
pad pad
MMMMM Description DIR OUT Pattern
--------------------------------------------------------------------------------------
00000 OUT (default) DIR OUT
00001 * DAC cog channel DIR OUT output-update-repeat
00010 * DAC 16-bit dither DIR OUT output-update-repeat
00011 * DAC 16-bit pwm DIR OUT output-update-repeat
00100 * tra pulse DIR mode wait-output-repeat
00101 * tra edges DIR mode wait-output-repeat
00110 * nco freq DIR mode output
00111 * nco duty DIR mode output
01000 * pwm triangle 16:16 DIR mode output-update-repeat
01001 * pwm sawtooth 16:16 DIR mode output-update-repeat
01010 * pwm SMPS, A=V DIR mode output-update-repeat
01011 * pwm SMPS, A=V, B=I DIR mode output-update-repeat
01100 * A-high inc DIR ** OUT measure-report-loop
01101 * A-rise inc DIR ** OUT measure-report-loop
01110 * A-high inc, B-high dec DIR ** OUT measure-report-loop
01111 * A-rise inc, B-rise dec DIR ** OUT measure-report-loop
10000 * A-B encoder DIR ** OUT measure-report-loop
10001 * time A states DIR ** OUT time-report-loop (MSB = state)
10010 * time A high DIR ** OUT time-report-loop
10011 * time A-rise to A-rise DIR ** OUT time-report-loop
10100 * custom 2-bit, 1-pattern DIR ** OUT custom
10101 * custom 2-bit, 1-pattern, out DIR ** mode custom
10110 * custom 1-bit, 2-pattern DIR ** OUT custom
10111 * custom 1-bit, 2-pattern, out DIR ** mode custom
11000 * B-clk tx byte DIR mode transmit-repeat
11001 * B-clk tx long DIR mode transmit-repeat
11010 * B-clk rx byte DIR ** OUT receive-repeat
11011 * B-clk rx long DIR ** OUT receive-repeat
11100 * async tx byte DIR mode transmit-repeat
11101 * async tx long DIR mode transmit-repeat
11110 * async rx byte DIR ** OUT receive-repeat
11111 * async rx long DIR ** OUT receive-repeat
* DIR from cogs: 0=reset, 1=start; IN to cogs: 1=done; 'PINACK pin' = clear done
** set %HHHLLL to %111111 (float/float) if your intent is to input
For the various modes, PINSETX/PINSETY are used to configure any settings after PINMODE is used to set the mode. PINREAD can be used at any time to read back the last-updated 32-bit value from the smart pin. PINACK is used to cancel the IN signal coming back from the smart pin that lets you know that it finished something. After PINACK, the smart pin with drop IN and then raise it again on the next completion. These events can drive interrupts.
504 x 64 pins = 32256 ALMs for Smartpins. From what I've read I'm thinking that'll account for around 25% of total synthesised logic. That's not over the top I don't think given what is being achieved on a per pin basis. Any ideas if it all fits on the silicon?
11000 * B-clk tx byte DIR mode transmit-repeat
11001 * B-clk tx long DIR mode transmit-repeat
11010 * B-clk rx byte DIR ** OUT receive-repeat
11011 * B-clk rx long DIR ** OUT receive-repeat
11100 * async tx byte DIR mode transmit-repeat
11101 * async tx long DIR mode transmit-repeat
11110 * async rx byte DIR ** OUT receive-repeat
11111 * async rx long DIR ** OUT receive-repeat
That consumes 8 modes, and only gives a restrictive 8 or 32bit Serial length choices.
Would it not be more flexible, and free up 4 modes, with a defined-elsewhere Tx/Rx bit-count value ?
We don't have to worry about disparate duplication of naming like that but Chip has identified another trademarked use of SmartPins naming for IC functionality already. There was a discussion a few weeks back ... Pion or Pinion were a couple of names I liked.
OUCH for the silicon size. 14nm here we come <tongue in cheek>
UGH - I tried to do a block diagram of this as I read it.
I still wonder...
* Does every pin require this?
* Is it an overkill use of silicon?
* Is there another more flexible way? (micro cogs might be more flexible but use more power)
* Does it reflect what we need or overkill?
* Delay and risk?
* Is it an overkill use of silicon?
* Is there another more flexible way? (micro cogs might be more flexible but use more power)
Another choice is 1 cell per Pair of pins, but Chip has this now half duplex, needing two cells for a duplex UART, so the Pin-Pair saving is not as great.
That could still be a fall-back point, as Pin Pair will have some logic saving. (at the expense of less total capability numbers)
Present split allows nominally 1 RX and 63 Tx (etc) for example. Paired, that becomes ~32 duplex channels.
* Does it reflect what we need or overkill?
* Delay and risk?
It is actually fairly similar to Cypress latest PSoC 4200L Family, put in a blender....
They have 16b (?!) timer 'cells' copied 8 times
( Who puts 16b timers into a 32b MCU ?!)
They have USART cells for (i2c/SPI/UART), copied 4 times, with 10 registers to config/operate, plus 8 byte FIFOs.
The i2c mode has a nifty dual-port memory mapped mode, (EZ) where external i2c pins can access 32 bytes of register memory, with no local SW needed.
We need to start testing P2 or it will be another year
It's already a year off.
A side effect of the larger number of pin cells Chip has chosen, is helper USB support is less likely - as the size of each cell is now very important.
With 4 spare modes using my idea above, 2 of those could map to USB shifters, where
* NZRI and
* bit-stuff/destuff
* Sampling reSync (digital PLL)
are the sensible bit-level tasks to put into silicon.
None of that is huge logic, but if the pin-cells are identical, it dies copy 64 times, which is rather an over-kill.
We need to start testing P2 or it will be another year
It's already a year off.
A side effect of the larger number of pin cells Chip has chosen, is helper USB support is less likely - as the size of each cell is now very important.
With 4 spare modes using my idea above, 2 of those could map to USB shifters, where
* NZRI and
* bit-stuff/destuff
* Sampling reSync (digital PLL)
are the sensible bit-level tasks to put into silicon.
None of that is huge logic, but if the pin-cells are identical, it dies copy 64 times, which is rather an over-kill.
I don't know, yet, what is needed, exactly, for USB in a smart pin. I know it's not overwhelming, but I also suspect much software is required on top of that before we could be sure that the smart pin implementation is correct.
It's true that at 180nm, complexity matters, but at something like 14nm, you couldn't design enough stuff to make a dent in what would be available, area-wise. It's frustrating that costs go so high, into the $millions, for a leading-edge process. All you could design would be 1mm square, and the rest could be RAM.
Sending the current smart pin to Weight Watchers would result in loss of functions. We could save area by equipping different pins with different smart pin functions, but it is really luxurious to be able to use any pin for anything.
I think we need to try out what's been developed, so far, and get a reading on it. It's kind of a new frontier in Prop land. I'll test everything in the morning and then get an update out. Documentation will be coming over the next few days.
Regarding the current 8- and 32-bit limitations on serial:
We could do ANY word length, but it adds a lot of mux's, which inflate the gate count. It would be better to pick what several word lengths are likely to be needed, and implement maybe four or eight different ones.
The programmable modes could do 1..32-bit synchronous serial, but they could not double-buffer data like the dedicated serial modes do. So, maybe the problem is solved, already.
I suspect a 9-bit mode might be handy in the dedicated serial circuit, to accommodate parity. Personally, I like to handle errors at a much higher level, rather than having to react from the trenches.
Regarding the current 8- and 32-bit limitations on serial:
We could do ANY word length, but it adds a lot of mux's, which inflate the gate count. It would be better to pick what several word lengths are likely to be needed, and implement maybe four or eight different ones.
I don't see a lot of muxes, as they are shifters, and a counter defines when to stop shifting and load ?
That means a 5b loadable unidirectional counter for length, which is quite compact in logic.
There is likely already a bit counter for 8 & 32, so it is adding 4 config bits.
I would expect 8 preset values to use more muxes/roms.
JMG, if I'm reading your thinking correctly, you are saying that the shift/buffer registers are managed as 32-bit in terms of parallel data then only the shifting is controlled by the length parameter?
Comments
Love the way the smart pins are coming together. So much flexibility and power yet they sound like they will be so easy to use. Really looking forward to getting my hands on the final product.
-Phil
Yes, 'do nothing' was one of the four combos.
You are probably wondering exactly how it works and what it can do.
First, a quick diagram of the smart pin.
Second, some waveforms and how it came to be. The waveform was actually gapped before it became flat.
You should do stand up.
These smart pins are turning out to be quite deluxe.
But you are right, a little diagram does make it clearer.
I admit to a having a high fever, but I swear that smart pin looks like a retro-virus and the output looks like an alley during Oktoberfest.
.. and capture ?
Sounds flexible.
There are some uses cases where you might want to have a 'hold-off', on Time interval A-B, or average Time interval A-B, and there a 3rd in Cell could get into the mix.
Others, where you may want to capture some user-defined number of cycles.
There, one Pin cell would count the N and another adjacent cell capture time on /N TC
With all the discussions going on I have not understood if the smart pins it's only a digital matter or have to do also with analog part. In this case I would like know if the following function (picture) can be done with the foreseen features. I mean having a comparator with swappable inverting and non inverting inputs (or selectable negated output) that can be ORed with the COG's output. For example to quickly shutdown a pwm output in case the comparator preset level is exceded. If the comparator could have also an optional selectable hold memory on output still better.
The aim is to use external components as les as possible, run the code as slow as possible for energy saving reasons and still be able to react quickly in case of excedance of limiting values.
My guess is that Reset signal can be either direct, or via Analog Comparator, but I've not seen Tpd of Vos numbers on Analog parts of the design yet.
There's an SMPS mode for PWM that can shorten the pulse, based on a shunt resistor being input to a comparator whose other input connects to an 8-bit DAC inside the I/O pin.
Making the programmable mode has been tough. It can do a lot of things, but it's going to take some usage before we know if it does what we need it to do. You can build all kinds of things out of it, like strange pulse measurers, serializers/deserializers, event trappers, and I don't know what else. It's not that complicated, but it's a little like chess, where you have different pieces that can move differently, and you could go insane trying to figure out all possibilities. This is going to require some documentation with diagrams.
To briefly describe the programmable mode, it works like this:
It inputs two pins: {B-previous, B-current, A-previous, A-current} to get a 4-bit input value on each clock. The A-current bit can be selected from a few other sources, as well.
It maintains a 32-bit variable which can be set to 0/1/-1, incremented or decremented (with or without saturation), right- or left-shifted with A-previous or 0, set to any value by a cog, or left the same, on each clock, according to input and state.
It uses dual 16-bit fields to pick 0's and 1's, based on the inputs and state. In 2-state mode, the 16-bit fields are selectively toggled between to get the 0/1 values. In 1-state mode, the fields are ganged together to get 0/1/2/3, to make four possible actions per clock, instead of just two.
Ultimately, a 5-bit opcode executes on each clock:
xx000 = <same>
xx001 = set acc = 0
xx010 = set acc = 1
xx011 = set acc = -1
xx100 = increment acc
xx101 = decrement acc
xx110 = A-previous/0 >> acc >> OUT
xx111 = OUT << acc << A-previous/0
x1xxx = next, 2-bit: toggle bit 0 xor for ops xx010..xx111 starting this op
x1xxx = next, 1-bit: toggle state, switches bitfields
1xxxx = signal: raises IN bit and outputs the 32-bit variable for reading by cog
You pick four 5-bit opcodes and the dual 16-bit fields, along with seven configuration bits and a 5-bit counter value which can be used to selectively delay toggling into the other state (by x1xxx occurrences).
As mentioned, A-current can be substituted with other signals: MSB or LSB of the 32-bit variable or the state bit (useful for 1-state mode, where fields are always combined, not toggled between).
You can pick either the MSB or LSB of the 32-bit variable to drive the OUT signal, if the cogs' OUT is not used. For each A and B input, you can pick the local pin's input, the inputs from relative pins +3/+2/+1/-1/-2/-3, the OUT signal coming from the cogs, or the adjacent pin's final OUT signal. Before A goes into the smart pin flops, you can select A^B, A|B, A&B, or just A. There are inversion selectors for all inputs, as well as output.
Here is a list of all pin modes:
For the various modes, PINSETX/PINSETY are used to configure any settings after PINMODE is used to set the mode. PINREAD can be used at any time to read back the last-updated 32-bit value from the smart pin. PINACK is used to cancel the IN signal coming back from the smart pin that lets you know that it finished something. After PINACK, the smart pin with drop IN and then raise it again on the next completion. These events can drive interrupts.
And I'm thinking the Prop1 counters can be emulated too! There'll be some cheering me thinks.
Different from a sequencer then. Fun fun!
re: I've got the whole smart pin done!
Wow! looks like smorgasboard of great options there. We'll be experimenting for the next 50 years with all the new instructions and options.
Don't suppose you have a diagram to help understand it? (even if hand drawn)
That consumes 8 modes, and only gives a restrictive 8 or 32bit Serial length choices.
Would it not be more flexible, and free up 4 modes, with a defined-elsewhere Tx/Rx bit-count value ?
I Googled "smart pins... and it is ...
...
... a Google game!!!
"SMART PINS is a game by Google where you answer geographic questions by dropping pins on a map."
So... if the question is ... "where were Smart Pins developed?"... I would drop Google's smart pin on the home of Parallax's smart pins... right?
The Smart Pin is also the home of "The classic Antique Pewter Collection"
By the way... you can now add 13 output pins to your next Arduino project... for a mere $4.95 at MakerShed.
WOW about the functionality.
OUCH for the silicon size. 14nm here we come <tongue in cheek>
UGH - I tried to do a block diagram of this as I read it.
I still wonder...
* Does every pin require this?
* Is it an overkill use of silicon?
* Is there another more flexible way? (micro cogs might be more flexible but use more power)
* Does it reflect what we need or overkill?
* Delay and risk?
ie suppose having one per pin, drops RAM from 512k to 256k, then what is the trade off position ?
Another choice is 1 cell per Pair of pins, but Chip has this now half duplex, needing two cells for a duplex UART, so the Pin-Pair saving is not as great.
That could still be a fall-back point, as Pin Pair will have some logic saving. (at the expense of less total capability numbers)
Present split allows nominally 1 RX and 63 Tx (etc) for example. Paired, that becomes ~32 duplex channels.
It is actually fairly similar to Cypress latest PSoC 4200L Family, put in a blender....
They have 16b (?!) timer 'cells' copied 8 times
( Who puts 16b timers into a 32b MCU ?!)
They have USART cells for (i2c/SPI/UART), copied 4 times, with 10 registers to config/operate, plus 8 byte FIFOs.
The i2c mode has a nifty dual-port memory mapped mode, (EZ) where external i2c pins can access 32 bytes of register memory, with no local SW needed.
Just my comments!
We need to start testing P2 or it will be another year
It's already a year off.
A side effect of the larger number of pin cells Chip has chosen, is helper USB support is less likely - as the size of each cell is now very important.
With 4 spare modes using my idea above, 2 of those could map to USB shifters, where
* NZRI and
* bit-stuff/destuff
* Sampling reSync (digital PLL)
are the sensible bit-level tasks to put into silicon.
None of that is huge logic, but if the pin-cells are identical, it dies copy 64 times, which is rather an over-kill.
I don't know, yet, what is needed, exactly, for USB in a smart pin. I know it's not overwhelming, but I also suspect much software is required on top of that before we could be sure that the smart pin implementation is correct.
It's true that at 180nm, complexity matters, but at something like 14nm, you couldn't design enough stuff to make a dent in what would be available, area-wise. It's frustrating that costs go so high, into the $millions, for a leading-edge process. All you could design would be 1mm square, and the rest could be RAM.
Sending the current smart pin to Weight Watchers would result in loss of functions. We could save area by equipping different pins with different smart pin functions, but it is really luxurious to be able to use any pin for anything.
I think we need to try out what's been developed, so far, and get a reading on it. It's kind of a new frontier in Prop land. I'll test everything in the morning and then get an update out. Documentation will be coming over the next few days.
We could do ANY word length, but it adds a lot of mux's, which inflate the gate count. It would be better to pick what several word lengths are likely to be needed, and implement maybe four or eight different ones.
The programmable modes could do 1..32-bit synchronous serial, but they could not double-buffer data like the dedicated serial modes do. So, maybe the problem is solved, already.
I suspect a 9-bit mode might be handy in the dedicated serial circuit, to accommodate parity. Personally, I like to handle errors at a much higher level, rather than having to react from the trenches.
That means a 5b loadable unidirectional counter for length, which is quite compact in logic.
There is likely already a bit counter for 8 & 32, so it is adding 4 config bits.
I would expect 8 preset values to use more muxes/roms.