I have totally lost the story on the serial I/O ideas for the PII.
I thought the idea was that ideally there would be some serial input and output shifters (SERDES) implemented with some simple support logic, such that many different protocols could be created with a little software support and much faster than bit-banging the bits.
Is it so that this turned out to be an impossible dream?
Because all this talk of SPI MASTER/SLAVE hardware makes me think we are edging toward the world of dedicated hardware peripherals in a most un-proplike manner.
- Chip posts the new update, with the great async serial 8/32 bit (optionally 4 bit addressable) uart
- many (including myself) would also like SER/DES, which pretty much requires an external clock for input, and output for external clock (for synchronous modes)
- the feasible maximum externally clocked synchronous rate was discussed, turns out clkfreq/1 would be far too much work and quite possibly impossible
- some other possible feature ideas were presented - ie manchester, usb bit stuffing etc - that turned out to be too much to support for P2
Now as long as serial clock out is supported, SPI master is automatically supported, especially if clock inversion is possible; the data can easily be inverted in software
SPI slave needs a clock in, I posted an analysis a few messages earlier of maximum bit rates possible with different software only, or hardware supported implementation.
I2C is so slow that (IMHO) it is best left to software bit-banging
I have totally lost the story on the serial I/O ideas for the PII.
I thought the idea was that ideally there would be some serial input and output shifters (SERDES) implemented with some simple support logic, such that many different protocols could be created with a little software support and much faster than bit-banging the bits.
Is it so that this turned out to be an impossible dream?
Because all this talk of SPI MASTER/SLAVE hardware makes me think we are edging toward the world of dedicated hardware peripherals in a most un-proplike manner.
FYI, there are more opcodes that have changed than are listed in Chip's message at the head of this thread. I wrote a program to compare Chip's opcode table with the one in PropGCC and found numerous other differences.
Here is the output that was generated by my opcode comparison program. The lines that begin with "A" are added opcodes and the ones that begin with "C" are changed from the last P2 FPGA configuration. I've left out the ones that were deleted because the list is full of our special PropGCC pseudo-ops as well as actual deleted instructions. The two hex numbers that follow the "C" or "A" flag are the opcode and the opcode mask.
/* 000011 ZCR 1 CCCC DDDDDDDDD 000000010 ( COGINIT D ) (waits for hub) */
C "coginit", 0x0c400002, 0xfc4001ff (was 0c000000 fc400000)
/* 000011 Z00 1 CCCC 111111111 001iiiiii REPD #i (infinite repeat) */
C "repd", 0x0c43fe40, 0xfdc3ffc0 (was 0c400040 fd4001c0)
/* 000011 ZC0 1 CCCC 000000000 010010101 CLRACCA */
C "clracca", 0x0c400095, 0xfcc3ffff (was 0c400208 fc43ffff)
/* 000011 ZC0 1 CCCC 000000000 010010110 CLRACCB */
C "clraccb", 0x0c400096, 0xfcc3ffff (was 0c400408 fc43ffff)
/* 000011 ZC0 1 CCCC 000000000 010010111 CLRACCS */
C "clraccs", 0x0c400097, 0xfcc3ffff (was 0c400608 fc43ffff)
/* 000011 ZC0 1 CCCC 000000000 010011000 CACHEX */
C "cachex", 0x0c400098, 0xfcc3ffff (was 0c400008 fc43ffff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011110 SETXCH D/#n */
C "setxch", 0x0c40009e, 0xfc4001ff (was 0c4000e8 fc4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011111 SETXFR D/#n */
C "setxfr", 0x0c40009f, 0xfc4001ff (was 0c4000e9 fc4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110010 SETPTRA D/#n */
C "setptra", 0x0c4000b2, 0xfc4001ff (was 0c4000b2 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110011 SETPTRB D/#n */
C "setptrb", 0x0c4000b3, 0xfc4001ff (was 0c4000b3 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110100 ADDPTRA D/#n */
C "addptra", 0x0c4000b4, 0xfc4001ff (was 0c4000b4 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110101 ADDPTRB D/#n */
C "addptrb", 0x0c4000b5, 0xfc4001ff (was 0c4000b5 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110110 SUBPTRA D/#n */
C "subptra", 0x0c4000b6, 0xfc4001ff (was 0c4000b6 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110111 SUBPTRB D/#n */
C "subptrb", 0x0c4000b7, 0xfc4001ff (was 0c4000b7 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 011001011 SETSKIP D/#n */
C "setskip", 0x0c4000cb, 0xfc4001ff (was 0c4000eb fc4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 011010101 SETDACS D/#n */
C "setdacs", 0x0c4000d5, 0xfc4001ff (was 0d0000d5 fc4001ff)
/* 000011 Z0N 1 CCCC nnnnnnnnn 011100010 SETQUAD D/#n */
C "setquad", 0x0c4000e2, 0xfd4001ff (was 0c4000e2 fc4001ff)
/* 000011 Z1N 1 CCCC nnnnnnnnn 011100010 SETQUAZ D/#n */
C "setquaz", 0x0d4000e2, 0xfd4001ff (was 0d4000e2 fc4001ff)
/* 000011 ZC0 1 CCCC 000000000 011110111 CAPCTRA */
C "capctra", 0x0c4000f7, 0xfcc3ffff (was 0c4000f7 fc4001ff)
/* 000011 ZC0 1 CCCC 000000000 011111111 CAPCTRB */
C "capctrb", 0x0c4000ff, 0xfcc3ffff (was 0c4000ff fc4001ff)
/* 000110 ZCR I CCCC DDDDDDDDD SSSSSSSSS ENC D,S */
C "enc", 0x18000000, 0xfc000000 (was 18800000 fc000000)
/* 000111 ZCR I CCCC DDDDDDDDD SSSSSSSSS JMPRET D,S */
C "jmpret", 0x1c000000, 0xfc000000 (was 1c800000 fc800000)
/* 010111 ZCR I CCCC DDDDDDDDD SSSSSSSSS JMPRETD D,S */
C "jmpretd", 0x5c000000, 0xfc000000 (was 5c800000 fc800000)
/* 100001 ZCR I CCCC DDDDDDDDD SSSSSSSSS SUB D,S */
C "sub", 0x84000000, 0xfc000000 (was 84800000 fc800000)
/* 110000 ZCR I CCCC DDDDDDDDD SSSSSSSSS CMPS D,S */
C "cmps", 0xc0000000, 0xfc000000 (was c0000000 fc800000)
/* 110011 ZCR I CCCC DDDDDDDDD SSSSSSSSS SUBX D,S */
C "subx", 0xcc000000, 0xfc000000 (was cc800000 fc800000)
/* 111000 ZCR I CCCC DDDDDDDDD SSSSSSSSS SUBR D,S */
C "subr", 0xe0000000, 0xfc000000 (was e0800000 fc800000)
/* 111001 ZCR I CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S */
C "cmpsub", 0xe4000000, 0xfc000000 (was e4800000 fc800000)
/* 000011 ZCR 1 CCCC DDDDDDDDD 000001000 SERINA D (waits for rx if single-task, loops if multi-task, releases if wc) */
A "serina", 0x0c400008, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000001001 SERINB D (waits for rx if single-task, loops if multi-task, releases if wc) */
A "serinb", 0x0c400009, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000101110 ESWAP4 D */
A "eswap4", 0x0c40002e, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000101111 ESWAP8 D */
A "eswap8", 0x0c40002f, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000111010 GETCOSA D */
A "getcosa", 0x0c40003a, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000111011 GETSINA D */
A "getsina", 0x0c40003b, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 0100100-- <empty> */
A "", 0x0c400090, 0xfc4001fc
/* 000011 ZCN 1 CCCC DDDnnnnnn 010011001 SARACCA D/#n (waits for mac) */
A "saracca", 0x0c400099, 0xfc4001ff
/* 000011 ZCN 1 CCCC DDDnnnnnn 010011010 SARACCB D/#n (waits for mac) */
A "saraccb", 0x0c40009a, 0xfc4001ff
/* 000011 ZCN 1 CCCC DDDnnnnnn 010011011 SARACCS D/#n (waits for mac) */
A "saraccs", 0x0c40009b, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011100 SEROUTA D/#n (waits for tx if single-task, loops if multi-task, releases if wc) */
A "serouta", 0x0c40009c, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011101 SEROUTB D/#n (waits for tx if single-task, loops if multi-task, releases if wc) */
A "seroutb", 0x0c40009d, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101000 SETPERA D/#n */
A "setpera", 0x0c4000e8, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101001 SETSERA D/#n */
A "setsera", 0x0c4000e9, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101010 SETPERB D/#n */
A "setperb", 0x0c4000ea, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101011 SETSERB D/#n */
A "setserb", 0x0c4000eb, 0xfc4001ff
/* 000011 ZC0 1 CCCC 000000000 011110110 SYNCTRA (waits for ctra if single-task, loops if multi-task)) */
A "synctra", 0x0c4000f6, 0xfcc3ffff
/* 000011 ZC0 1 CCCC 000000000 011111110 SYNCTRB (waits for ctrb if single-task, loops if multi-task)) */
A "synctrb", 0x0c4000fe, 0xfcc3ffff
/* 000100 100 I CCCC DDDDDDDDD SSSSSSSSS MACA D,S */
A "maca", 0x12000000, 0xff800000
/* 000100 110 I CCCC DDDDDDDDD SSSSSSSSS MACB D,S */
A "macb", 0x13000000, 0xff800000
/* 000101 ZC1 I CCCC DDDDDDDDD SSSSSSSSS SCL D,S (waits one clock) */
A "scl", 0x14800000, 0xfc800000
/* 111000 000 I BBAA DDDDDDDDD SSSSSSSSS SETINDx D,S (SETINDA S / SETINDB D / SETINDS D,S) */
A "setindx", 0xe0000000, 0xff800000
/* 111001 000 I 0B0A DDDDDDDDD SSSSSSSSS FIXINDx D,S (FIXINDA D,S / FIXINDB D,S / FIXINDS D,S) */
A "fixindx", 0xe4000000, 0xffa80000
Here is the output that was generated by my opcode comparison program. The lines that begin with "A" are added opcodes and the ones that begin with "C" are changed from the last P2 FPGA configuration. I've left out the ones that were deleted because the list is full of our special PropGCC pseudo-ops as well as actual deleted instructions. The two hex numbers that follow the "C" or "A" flag are the opcode and the opcode mask.
/* 000011 ZCR 1 CCCC DDDDDDDDD 000000010 ( COGINIT D ) (waits for hub) */
C "coginit", 0x0c400002, 0xfc4001ff (was 0c000000 fc400000)
/* 000011 Z00 1 CCCC 111111111 001iiiiii REPD #i (infinite repeat) */
C "repd", 0x0c43fe40, 0xfdc3ffc0 (was 0c400040 fd4001c0)
/* 000011 ZC0 1 CCCC 000000000 010010101 CLRACCA */
C "clracca", 0x0c400095, 0xfcc3ffff (was 0c400208 fc43ffff)
/* 000011 ZC0 1 CCCC 000000000 010010110 CLRACCB */
C "clraccb", 0x0c400096, 0xfcc3ffff (was 0c400408 fc43ffff)
/* 000011 ZC0 1 CCCC 000000000 010010111 CLRACCS */
C "clraccs", 0x0c400097, 0xfcc3ffff (was 0c400608 fc43ffff)
/* 000011 ZC0 1 CCCC 000000000 010011000 CACHEX */
C "cachex", 0x0c400098, 0xfcc3ffff (was 0c400008 fc43ffff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011110 SETXCH D/#n */
C "setxch", 0x0c40009e, 0xfc4001ff (was 0c4000e8 fc4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011111 SETXFR D/#n */
C "setxfr", 0x0c40009f, 0xfc4001ff (was 0c4000e9 fc4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110010 SETPTRA D/#n */
C "setptra", 0x0c4000b2, 0xfc4001ff (was 0c4000b2 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110011 SETPTRB D/#n */
C "setptrb", 0x0c4000b3, 0xfc4001ff (was 0c4000b3 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110100 ADDPTRA D/#n */
C "addptra", 0x0c4000b4, 0xfc4001ff (was 0c4000b4 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110101 ADDPTRB D/#n */
C "addptrb", 0x0c4000b5, 0xfc4001ff (was 0c4000b5 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110110 SUBPTRA D/#n */
C "subptra", 0x0c4000b6, 0xfc4001ff (was 0c4000b6 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 010110111 SUBPTRB D/#n */
C "subptrb", 0x0c4000b7, 0xfc4001ff (was 0c4000b7 fd4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 011001011 SETSKIP D/#n */
C "setskip", 0x0c4000cb, 0xfc4001ff (was 0c4000eb fc4001ff)
/* 000011 ZCN 1 CCCC nnnnnnnnn 011010101 SETDACS D/#n */
C "setdacs", 0x0c4000d5, 0xfc4001ff (was 0d0000d5 fc4001ff)
/* 000011 Z0N 1 CCCC nnnnnnnnn 011100010 SETQUAD D/#n */
C "setquad", 0x0c4000e2, 0xfd4001ff (was 0c4000e2 fc4001ff)
/* 000011 Z1N 1 CCCC nnnnnnnnn 011100010 SETQUAZ D/#n */
C "setquaz", 0x0d4000e2, 0xfd4001ff (was 0d4000e2 fc4001ff)
/* 000011 ZC0 1 CCCC 000000000 011110111 CAPCTRA */
C "capctra", 0x0c4000f7, 0xfcc3ffff (was 0c4000f7 fc4001ff)
/* 000011 ZC0 1 CCCC 000000000 011111111 CAPCTRB */
C "capctrb", 0x0c4000ff, 0xfcc3ffff (was 0c4000ff fc4001ff)
/* 000110 ZCR I CCCC DDDDDDDDD SSSSSSSSS ENC D,S */
C "enc", 0x18000000, 0xfc000000 (was 18800000 fc000000)
/* 000111 ZCR I CCCC DDDDDDDDD SSSSSSSSS JMPRET D,S */
C "jmpret", 0x1c000000, 0xfc000000 (was 1c800000 fc800000)
/* 010111 ZCR I CCCC DDDDDDDDD SSSSSSSSS JMPRETD D,S */
C "jmpretd", 0x5c000000, 0xfc000000 (was 5c800000 fc800000)
/* 100001 ZCR I CCCC DDDDDDDDD SSSSSSSSS SUB D,S */
C "sub", 0x84000000, 0xfc000000 (was 84800000 fc800000)
/* 110000 ZCR I CCCC DDDDDDDDD SSSSSSSSS CMPS D,S */
C "cmps", 0xc0000000, 0xfc000000 (was c0000000 fc800000)
/* 110011 ZCR I CCCC DDDDDDDDD SSSSSSSSS SUBX D,S */
C "subx", 0xcc000000, 0xfc000000 (was cc800000 fc800000)
/* 111000 ZCR I CCCC DDDDDDDDD SSSSSSSSS SUBR D,S */
C "subr", 0xe0000000, 0xfc000000 (was e0800000 fc800000)
/* 111001 ZCR I CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S */
C "cmpsub", 0xe4000000, 0xfc000000 (was e4800000 fc800000)
/* 000011 ZCR 1 CCCC DDDDDDDDD 000001000 SERINA D (waits for rx if single-task, loops if multi-task, releases if wc) */
A "serina", 0x0c400008, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000001001 SERINB D (waits for rx if single-task, loops if multi-task, releases if wc) */
A "serinb", 0x0c400009, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000101110 ESWAP4 D */
A "eswap4", 0x0c40002e, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000101111 ESWAP8 D */
A "eswap8", 0x0c40002f, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000111010 GETCOSA D */
A "getcosa", 0x0c40003a, 0xfc4001ff
/* 000011 ZCR 1 CCCC DDDDDDDDD 000111011 GETSINA D */
A "getsina", 0x0c40003b, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 0100100-- <empty> */
A "", 0x0c400090, 0xfc4001fc
/* 000011 ZCN 1 CCCC DDDnnnnnn 010011001 SARACCA D/#n (waits for mac) */
A "saracca", 0x0c400099, 0xfc4001ff
/* 000011 ZCN 1 CCCC DDDnnnnnn 010011010 SARACCB D/#n (waits for mac) */
A "saraccb", 0x0c40009a, 0xfc4001ff
/* 000011 ZCN 1 CCCC DDDnnnnnn 010011011 SARACCS D/#n (waits for mac) */
A "saraccs", 0x0c40009b, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011100 SEROUTA D/#n (waits for tx if single-task, loops if multi-task, releases if wc) */
A "serouta", 0x0c40009c, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 010011101 SEROUTB D/#n (waits for tx if single-task, loops if multi-task, releases if wc) */
A "seroutb", 0x0c40009d, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101000 SETPERA D/#n */
A "setpera", 0x0c4000e8, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101001 SETSERA D/#n */
A "setsera", 0x0c4000e9, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101010 SETPERB D/#n */
A "setperb", 0x0c4000ea, 0xfc4001ff
/* 000011 ZCN 1 CCCC nnnnnnnnn 011101011 SETSERB D/#n */
A "setserb", 0x0c4000eb, 0xfc4001ff
/* 000011 ZC0 1 CCCC 000000000 011110110 SYNCTRA (waits for ctra if single-task, loops if multi-task)) */
A "synctra", 0x0c4000f6, 0xfcc3ffff
/* 000011 ZC0 1 CCCC 000000000 011111110 SYNCTRB (waits for ctrb if single-task, loops if multi-task)) */
A "synctrb", 0x0c4000fe, 0xfcc3ffff
/* 000100 100 I CCCC DDDDDDDDD SSSSSSSSS MACA D,S */
A "maca", 0x12000000, 0xff800000
/* 000100 110 I CCCC DDDDDDDDD SSSSSSSSS MACB D,S */
A "macb", 0x13000000, 0xff800000
/* 000101 ZC1 I CCCC DDDDDDDDD SSSSSSSSS SCL D,S (waits one clock) */
A "scl", 0x14800000, 0xfc800000
/* 111000 000 I BBAA DDDDDDDDD SSSSSSSSS SETINDx D,S (SETINDA S / SETINDB D / SETINDS D,S) */
A "setindx", 0xe0000000, 0xff800000
/* 111001 000 I 0B0A DDDDDDDDD SSSSSSSSS FIXINDx D,S (FIXINDA D,S / FIXINDB D,S / FIXINDS D,S) */
A "fixindx", 0xe4000000, 0xffa80000
Sorry about all that. Some of those changes weren't really changes, just refinements to the default bit states.
I have totally lost the story on the serial I/O ideas for the PII.
I thought the idea was that ideally there would be some serial input and output shifters (SERDES) implemented with some simple support logic, such that many different protocols could be created with a little software support and much faster than bit-banging the bits.
Is it so that this turned out to be an impossible dream?
Because all this talk of SPI MASTER/SLAVE hardware makes me think we are edging toward the world of dedicated hardware peripherals in a most un-proplike manner.
I think we'll get something good. Soon the sausage filling will squirt through the extruder and we'll put it in the casing.
Have a scheme where the pin definition if odd then async serial, if even, it's sync serial with clock on pin -1. So if you select pin 1, it puts out an async serial stream on pin 1, if you choose pin 2, it puts out a serial stream on pin 2, and clock on pin 1. Interestingly this makes zero difference to the state machine, it just exposes the clock on pin n-1, it could still put out 8n1 serial on pin n, if you disable start and stop bit, then you have an 8 or 32 bit shift register with sync clock on pin n-1.
We also discussed the potential for defining the register width, so you might be able to do 1/2/4/8 bit register widths, where 4 would effectively be QSPI in sync mode. The clock would be pin n-1 and the data would be pin n to pin n+3, using 5 pins starting at pin n-1 and going to pin n+3. You couldn't have 4x 8 bit registers in a single port, but that's a minor limitation in the grand scheme. Of course, having ports > 1 bit width are not very useful outside of sync modes.
We also discussed the idea of changing the bit clock definition so at higher speeds you have better granularity and can talk to legacy devices.
At 160Mhz the minimum bit clock speed is 2441 bps, which is within 5% of 2400 baud. Since the practical minimum value of that register is 3, the idea was to reallocate the 16 bit scaler to something like a 4 + 10, where there is a 4 bit prescaler and a 10 bit scaler, as Chip put it, a mantissa.
So some 4 bit prescaler to divide the main clock down, the minimum value being 2, since /3 is the max reasonable speed. If you have something like 2^4 possible prescalers, like the counter 1/2/4/8/16/32/64/128/256.
Doing some math, the perfect split seems to be 2^4 prescaler and 2^11 mantissa, which gives you 305bps at 160Mhz with /256 and 2048, which would be really clean and < half of 5% error.
Doing some math, the perfect split seems to be 2^4 prescaler and 2^11 mantissa, which gives you 305bps at 160Mhz with /256 and 2048, which would be really clean and < half of 5% error.
A 2.5% timing error is not good enough for 32-bit async. Even a 1.5% timing error either way will cause the 32nd bit to be sampled at the bit boundary, rather than anywhere near the center.
Have a scheme where the pin definition if odd then async serial, if even, it's sync serial with clock on pin -1. So if you select pin 1, it puts out an async serial stream on pin 1, if you choose pin 2, it puts out a serial stream on pin 2, and clock on pin 1. Interestingly this makes zero difference to the state machine, it just exposes the clock on pin n-1, it could still put out 8n1 serial on pin n, if you disable start and stop bit, then you have an 8 or 32 bit shift register with sync clock on pin n-1.
Sounds good, handling pin-mapping was always going to be tricky....
We also discussed the potential for defining the register width, so you might be able to do 1/2/4/8 bit register widths, where 4 would effectively be QSPI in sync mode. The clock would be pin n-1 and the data would be pin n to pin n+3, using 5 pins starting at pin n-1 and going to pin n+3. You couldn't have 4x 8 bit registers in a single port, but that's a minor limitation in the grand scheme. Of course, having ports > 1 bit width are not very useful outside of sync modes.
We also discussed the idea of changing the bit clock definition so at higher speeds you have better granularity and can talk to legacy devices.
At 160Mhz the minimum bit clock speed is 2441 bps, which is within 5% of 2400 baud. Since the practical minimum value of that register is 3, the idea was to reallocate the 16 bit scaler to something like a 4 + 10, where there is a 4 bit prescaler and a 10 bit scaler, as Chip put it, a mantissa.
So some 4 bit prescaler to divide the main clock down, the minimum value being 2, since /3 is the max reasonable speed. If you have something like 2^4 possible prescalers, like the counter 1/2/4/8/16/32/64/128/256.
Doing some math, the perfect split seems to be 2^4 prescaler and 2^11 mantissa, which gives you 305bps at 160Mhz with /256 and 2048, which would be really clean and < half of 5% error.
Good. I guess that has dropped bits, to make room for other control flags ?
The 2^4 could be common between TX and RX, to free up more control bits ?
... Because all this talk of SPI MASTER/SLAVE hardware makes me think we are edging toward the world of dedicated hardware peripherals in a most un-proplike manner.
Not really, SPI is the simplest serial interface possible, and it makes sense to do the bit-level stuff in hardware, leaving software to manage the byte-level shuffling.
Doing that allows the Prop to match the speeds of other connected devices, and frees valuable COG code space.
Could an instruction (and appropriate logic) be added to set whether a pin output is inverted (or maybe latched value is negated)? This one instruction would allow for setting both phase (for SPI CLK) and polarity (for SPI and UART). This would also have general applicability to interfacing with any peripheral where logic true is low.
A 2.5% timing error is not good enough for 32-bit async. Even a 1.5% timing error either way will cause the 32nd bit to be sampled at the bit boundary, rather than anywhere near the center.
true, but a 2^11 counter gives ~0.1% granularity mid-scale.
Even at 115200, with a 160MHz starting clk, and an assumed /2 in the State engine for mid-bit sampling, we have
160M/2/115200 = 694 for N, or a granularity of 0.144% (and a virtual baud clock of 80MHz)
Yes, I'm all in favor of having serial shifting in hardware. As I said a few posts back. The trick is to make it flexible, not just UART, not just SPI, but configurable for many things. Perhaps even things we have not thought of yet.
Yes, I all in favor of having serial shifting in hardware. As I said a few posts back. The tick is to make it flexible, not just UART, not just SPI, but configurable for many things. Perhaps even things we have not thought of yet.
Sounds like Chip has a handle on it.
Agreed.
If there is control-register space, I'd also like to see granular length control (expanding 8 & 32), since 9 will also be needed, and some systems like 2 stop bits, it seems a 5 bit bit-frame-length is the simplest way to manage this.
That also then gives a nice way to send a precise break for LIN and similar protocols.
Any specialized serial output circuitry is redundant, isn't it? You can use the video circuitry to output fast async, isosync, SPI, etc. It's just the input that's a problem. Unfortunately, while output is mode-agnostic (IOW you can load the video registers with async data or clock and data, etc.), input is not, due to various word-length and synchronization standards.
This serial thing is being taken to extremes. Originally we (many here) just wanted a really simple basic serialiser and deserialiser with a flexible clocking scheme.
What Chip did was create for us a souped up version that did async complete with start and stop bits, 8 bit characters or 32 bit characters, with an optional additional 4 bit addressing scheme. This is great for UART replacement and for interprop comms, but does not work as a basic serialiser/deserialiser. Therefore its functionality went too far to be useful in the many ways most of us originally conceived.
Now we are talking about also doing SPI Master only, followed by a reasonable argument Slave is also important. I agree both Master and Slave are important. Both of these could be achieved with a little more software help using a general purpose serialiser/deserialiser. But a SPI Master & Slave does not fit the bill for a simple serialiser/deserialiser.
Then it was mentioned that bit(un)stuffing was too complicated so we should forget USB (and Ethernet). NOT SO!!!
With a serialiser/deserialiser, software assist can do bit(un)stuffing. It has been done for USB for both LS (1.5MHz IIRC) and FS (12MHz) already. Before those opponents point out that it is not a compliant USB, it works and a lot of implementations using micros out there are not compliant and work.
There is P1 code from BradC and Scanlime that do USB LS and FS respectively. I have done some work on this too, although I have not posted it. I understand the raw data stream, but not the higher level.
I believe we can already invert any input or output pin by setting the pin configuration. Is this correct? If so, no need to invert the datastream or clocks, whether input or output.
I need to point out that <1% is required for baud rates. If not, then forget it. Apple made this mistake in the mid 80's on the Apple //c and I had to correct this in the firmware of the Apple Modem 1200 that I designed.
So what would a basic/general serialiser and deserialiser do...
Serialiser (output/transmit):
clock: As per Chip has done already plus external pin option
enable/disable shifting command
No start/stop bits in hardware (else enable/disable start & stop bits)
LSB first (else option for LSB/MSB first)
"1" shifted in at the far end when shift occurs (else option to shift in 0/1) (this effectively shifts in stop bits)
32 bits (else option for shorter shifts 1..32 bits)
access to the 32bit shift register should preferably be both read/write
Instead of read access, it might be nice to know how many bits have been shifted out since last loaded (6 bit counter so underrun could be detected?)
Deserialiser (input/receive):
clock: As per Chip has done already plus external pin option
enable/disable shifting command
No start/stop bits in hardware (else enable/disable start & stop bit detection)
Would be nice to specify wait for pin change (or 0/1) on enable - in this case, sampling would begin 1/2 bit after the change
LSB first (else option for LSB/MSB first)
32 bits (else option for shorter shifts 1.. 32 bits)
access to the 32bit shift register should preferably be both read/write
WZ could specify zero the register on read instruction
As for transmit, it might be nice to know how many shifts have occurred since the last read/clear (6 bit counter so overrun can be detected)
By being able to write to this register, we could write #1 which would the permit software to determine how many shifts had occurred (<30)
What could we do with this hardware...
UART 5/6/7/8 bits, N/O/E parity, 1/1.5/2 stop (or almost any other weird async protocol)
USB LS & FS (because we can stuff/unstuff in software)
SPI Master & Slave
I2C (can be bit banged)
PS2
1 wire
Probably 10bit Ethernet
Others???
PWM etc
Do we need an A & B version per cog? Probably not.
<snip>
[*]access to the 32bit shift register should preferably be both read/write
[*]As for transmit, it might be nice to know how many shifts have occurred since the last read/clear (6 bit counter so overrun can be detected)
[*]By being able to write to this register, we could write #1 which would the permit software to determine how many shifts had occurred (<30)
Mostly I agree with [else] options listed, but the detail above I'm less clear on.
Checking how many bits have gone is useful (certainly TX_Done is needed for things like RS485) but the shift register itself is usually buffered invisibly, and so not shift-visible.
Making it shift visible costs another register map, for little gain in information.
What could be done, is to make the Length register read/write, so write defines the (usually fixed) bit-length load, and read reads back the shifted bit down counter, which is counting down during shifts. (& reloads with length when done )
In SPI these values are same size, but get a little trickier in Async, as a 9 bit uart has at least 11 counts, but maybe read here could get state bits and some, but not all, bit counter bits ?
'Done' would have a unique value, as would 'waiting on start', but unless the field was expanded, full granular tracking of Async would not be possible.
Maybe there is room to expand on read, without bumping the register space ?
From the POV of reception, what do SPI and serial async have in common? Almost nothing. Just a shift register. What does a shift register (or two) cost in terms of silicon real estate? Almost nothing.
While a universal serial receiver would be a great goal conceptually, in a practical sense it leverages almost no common circuitry. So why make them one block schematically? Why not provide a separate UART, which Chip has already done, and a simple SPI?
Chip:
Can you think of a reason why IJZ/IJZD/IJNZ/IJNZD and DJZ/DJZD/DJNZ/DJNZD would ever use the r=0 (NR) option?
If not, might it be prudent to move the TJZ/TJZD/TJNZ/TJNZD to the IJxx opcode 111100 with r=0 (NR) and JP/JPD/JNP/JNPD to the DJxx opcode 111101 with r=0 (NR) ?
This would save the whole opcode 111110 for some possible future use.
Any specialized serial output circuitry is redundant, isn't it? You can use the video circuitry to output fast async, isosync, SPI, etc. It's just the input that's a problem. Unfortunately, while output is mode-agnostic (IOW you can load the video registers with async data or clock and data, etc.), input is not, due to various word-length and synchronization standards.
-Phil
The video isn't a binary shift register anymore, it outputs color data via DACs with a lookup table. Not the same beast as the Prop 1 and can't be abused in the same way. You could have 2 DAC entries for rail to rail, but it still has fixed output modes, like RGB or NTSC, using multiple pins or specialized encoding.
Comments
I thought the idea was that ideally there would be some serial input and output shifters (SERDES) implemented with some simple support logic, such that many different protocols could be created with a little software support and much faster than bit-banging the bits.
Is it so that this turned out to be an impossible dream?
Because all this talk of SPI MASTER/SLAVE hardware makes me think we are edging toward the world of dedicated hardware peripherals in a most un-proplike manner.
- Chip posts the new update, with the great async serial 8/32 bit (optionally 4 bit addressable) uart
- many (including myself) would also like SER/DES, which pretty much requires an external clock for input, and output for external clock (for synchronous modes)
- the feasible maximum externally clocked synchronous rate was discussed, turns out clkfreq/1 would be far too much work and quite possibly impossible
- some other possible feature ideas were presented - ie manchester, usb bit stuffing etc - that turned out to be too much to support for P2
Now as long as serial clock out is supported, SPI master is automatically supported, especially if clock inversion is possible; the data can easily be inverted in software
SPI slave needs a clock in, I posted an analysis a few messages earlier of maximum bit rates possible with different software only, or hardware supported implementation.
I2C is so slow that (IMHO) it is best left to software bit-banging
SPI is so simple that I don't think the code would get any smaller. A hardware peripheral would just make it faster and hands-free.
Sorry about all that. Some of those changes weren't really changes, just refinements to the default bit states.
Yes, those are all R=1 instructions, by default.
I think we'll get something good. Soon the sausage filling will squirt through the extruder and we'll put it in the casing.
Have a flag for disabling start bit and stop bit.
Have a scheme where the pin definition if odd then async serial, if even, it's sync serial with clock on pin -1. So if you select pin 1, it puts out an async serial stream on pin 1, if you choose pin 2, it puts out a serial stream on pin 2, and clock on pin 1. Interestingly this makes zero difference to the state machine, it just exposes the clock on pin n-1, it could still put out 8n1 serial on pin n, if you disable start and stop bit, then you have an 8 or 32 bit shift register with sync clock on pin n-1.
We also discussed the potential for defining the register width, so you might be able to do 1/2/4/8 bit register widths, where 4 would effectively be QSPI in sync mode. The clock would be pin n-1 and the data would be pin n to pin n+3, using 5 pins starting at pin n-1 and going to pin n+3. You couldn't have 4x 8 bit registers in a single port, but that's a minor limitation in the grand scheme. Of course, having ports > 1 bit width are not very useful outside of sync modes.
We also discussed the idea of changing the bit clock definition so at higher speeds you have better granularity and can talk to legacy devices.
At 160Mhz the minimum bit clock speed is 2441 bps, which is within 5% of 2400 baud. Since the practical minimum value of that register is 3, the idea was to reallocate the 16 bit scaler to something like a 4 + 10, where there is a 4 bit prescaler and a 10 bit scaler, as Chip put it, a mantissa.
So some 4 bit prescaler to divide the main clock down, the minimum value being 2, since /3 is the max reasonable speed. If you have something like 2^4 possible prescalers, like the counter 1/2/4/8/16/32/64/128/256.
Doing some math, the perfect split seems to be 2^4 prescaler and 2^11 mantissa, which gives you 305bps at 160Mhz with /256 and 2048, which would be really clean and < half of 5% error.
Perhaps not so un-proplike if each cog is equipped identically. Timer and video generator hardware already set a precedent in this respect.
A 2.5% timing error is not good enough for 32-bit async. Even a 1.5% timing error either way will cause the 32nd bit to be sampled at the bit boundary, rather than anywhere near the center.
-Phil
Great. - and also a means to set clock phase, and polarity -as in std SPI ?
Sounds good, handling pin-mapping was always going to be tricky....
nicely solves the mapping issues of Quad SPI
Good. I guess that has dropped bits, to make room for other control flags ?
The 2^4 could be common between TX and RX, to free up more control bits ?
It just worries me.
The timers are kind of nice generic bits of hardware with many modes. The video, well, it's a off thing the Prop has that most other MCU's don't.
Which is a bit differnt that saying "this is SPI harware, it does SPI and nothing else".
Still, it's it's small and fits and all COGS have the same, I might be fussing too much.
Not really, SPI is the simplest serial interface possible, and it makes sense to do the bit-level stuff in hardware, leaving software to manage the byte-level shuffling.
Doing that allows the Prop to match the speeds of other connected devices, and frees valuable COG code space.
I think it's more like "This is UART hardware, it does limited UART and nothing else. Can we make it more generic to include SPI?"
true, but a 2^11 counter gives ~0.1% granularity mid-scale.
Even at 115200, with a 160MHz starting clk, and an assumed /2 in the State engine for mid-bit sampling, we have
160M/2/115200 = 694 for N, or a granularity of 0.144% (and a virtual baud clock of 80MHz)
Yes, I'm all in favor of having serial shifting in hardware. As I said a few posts back. The trick is to make it flexible, not just UART, not just SPI, but configurable for many things. Perhaps even things we have not thought of yet.
Sounds like Chip has a handle on it.
Agreed.
If there is control-register space, I'd also like to see granular length control (expanding 8 & 32), since 9 will also be needed, and some systems like 2 stop bits, it seems a 5 bit bit-frame-length is the simplest way to manage this.
That also then gives a nice way to send a precise break for LIN and similar protocols.
-Phil
What Chip did was create for us a souped up version that did async complete with start and stop bits, 8 bit characters or 32 bit characters, with an optional additional 4 bit addressing scheme. This is great for UART replacement and for interprop comms, but does not work as a basic serialiser/deserialiser. Therefore its functionality went too far to be useful in the many ways most of us originally conceived.
Now we are talking about also doing SPI Master only, followed by a reasonable argument Slave is also important. I agree both Master and Slave are important. Both of these could be achieved with a little more software help using a general purpose serialiser/deserialiser. But a SPI Master & Slave does not fit the bill for a simple serialiser/deserialiser.
Then it was mentioned that bit(un)stuffing was too complicated so we should forget USB (and Ethernet). NOT SO!!!
With a serialiser/deserialiser, software assist can do bit(un)stuffing. It has been done for USB for both LS (1.5MHz IIRC) and FS (12MHz) already. Before those opponents point out that it is not a compliant USB, it works and a lot of implementations using micros out there are not compliant and work.
There is P1 code from BradC and Scanlime that do USB LS and FS respectively. I have done some work on this too, although I have not posted it. I understand the raw data stream, but not the higher level.
I believe we can already invert any input or output pin by setting the pin configuration. Is this correct? If so, no need to invert the datastream or clocks, whether input or output.
I need to point out that <1% is required for baud rates. If not, then forget it. Apple made this mistake in the mid 80's on the Apple //c and I had to correct this in the firmware of the Apple Modem 1200 that I designed.
So what would a basic/general serialiser and deserialiser do...
Serialiser (output/transmit):
- clock: As per Chip has done already plus external pin option
- enable/disable shifting command
- No start/stop bits in hardware (else enable/disable start & stop bits)
- LSB first (else option for LSB/MSB first)
- "1" shifted in at the far end when shift occurs (else option to shift in 0/1) (this effectively shifts in stop bits)
- 32 bits (else option for shorter shifts 1..32 bits)
- access to the 32bit shift register should preferably be both read/write
- Instead of read access, it might be nice to know how many bits have been shifted out since last loaded (6 bit counter so underrun could be detected?)
Deserialiser (input/receive):- clock: As per Chip has done already plus external pin option
- enable/disable shifting command
- No start/stop bits in hardware (else enable/disable start & stop bit detection)
- Would be nice to specify wait for pin change (or 0/1) on enable - in this case, sampling would begin 1/2 bit after the change
- LSB first (else option for LSB/MSB first)
- 32 bits (else option for shorter shifts 1.. 32 bits)
- access to the 32bit shift register should preferably be both read/write
- WZ could specify zero the register on read instruction
- As for transmit, it might be nice to know how many shifts have occurred since the last read/clear (6 bit counter so overrun can be detected)
- By being able to write to this register, we could write #1 which would the permit software to determine how many shifts had occurred (<30)
What could we do with this hardware...- UART 5/6/7/8 bits, N/O/E parity, 1/1.5/2 stop (or almost any other weird async protocol)
- USB LS & FS (because we can stuff/unstuff in software)
- SPI Master & Slave
- I2C (can be bit banged)
- PS2
- 1 wire
- Probably 10bit Ethernet
- Others???
- PWM etc
Do we need an A & B version per cog? Probably not.One would be nice, two would be better, but more is always better, I'd be happy with one.
How do you see pins being assigned? Each pin individually addressable, or a group of pins like the video generator?
C.W.
Mostly I agree with [else] options listed, but the detail above I'm less clear on.
Checking how many bits have gone is useful (certainly TX_Done is needed for things like RS485) but the shift register itself is usually buffered invisibly, and so not shift-visible.
Making it shift visible costs another register map, for little gain in information.
What could be done, is to make the Length register read/write, so write defines the (usually fixed) bit-length load, and read reads back the shifted bit down counter, which is counting down during shifts. (& reloads with length when done )
In SPI these values are same size, but get a little trickier in Async, as a 9 bit uart has at least 11 counts, but maybe read here could get state bits and some, but not all, bit counter bits ?
'Done' would have a unique value, as would 'waiting on start', but unless the field was expanded, full granular tracking of Async would not be possible.
Maybe there is room to expand on read, without bumping the register space ?
While a universal serial receiver would be a great goal conceptually, in a practical sense it leverages almost no common circuitry. So why make them one block schematically? Why not provide a separate UART, which Chip has already done, and a simple SPI?
Two would allow a single COG to do a wide range of bridging tasks, and make SW management much easier.
See post #131 - thinking there is for each pin addressable, but multi-pin modes, follow a leader pin.
Can you think of a reason why IJZ/IJZD/IJNZ/IJNZD and DJZ/DJZD/DJNZ/DJNZD would ever use the r=0 (NR) option?
If not, might it be prudent to move the TJZ/TJZD/TJNZ/TJNZD to the IJxx opcode 111100 with r=0 (NR) and JP/JPD/JNP/JNPD to the DJxx opcode 111101 with r=0 (NR) ?
This would save the whole opcode 111110 for some possible future use.
The video isn't a binary shift register anymore, it outputs color data via DACs with a lookup table. Not the same beast as the Prop 1 and can't be abused in the same way. You could have 2 DAC entries for rail to rail, but it still has fixed output modes, like RGB or NTSC, using multiple pins or specialized encoding.