Ozpropdev, I suppose you've already realized, but it will be necessary to modify your tools to accommodate the moved instructions: WAITX/RFVAR/RFVARS and the new instruction: ONES. Also, the data reported back from GETINT and SETBRK has been changed and added to. These may have effect on your single-step debugger.
I've had some low-level anxiety for a week over the flakiness that Peter was experiencing. I really hope we'll discover any lurking problems. It would be good to know what caused his troubles. And I don't understand why that SD card mux caused things to lock up. Very strange, but now that I've hard-coded the SD card pins into P61..P58, things seems to be okay.
Chip,
Do you want a couple of new instructions?
If so, there were a couple you had intended to implement way back ...
1. A basic instruction for helping calculate CRC
2. An instruction to help accumulate NRZI serial eg USB. However, we have smart pins that can do this, so it would just provide some additional bit bang help.
Chip,
Do you want a couple of new instructions?
If so, there were a couple you had intended to implement way back ...
1. A basic instruction for helping calculate CRC
2. An instruction to help accumulate NRZI serial eg USB. However, we have smart pins that can do this, so it would just provide some additional bit bang help.
1) I think this would be very useful for USB intra-packet calculations. What does it look like, though?
I will dig out the old thread(s) and refresh myself and then post an update/link.
I worked out that it was fairly simple to have a CRC bit calculator, where the new bit was in the C or Z flag. This way the CRC version (CRC16, CCITT, etc) could be "programmed" in software. It could operate in its own "adjacent" cog pair to get parallel throughout.
Meanwhile, I am also working on the minimal SD boot code and v28.
BTW I am really pleased with the widening of the various registers to 32 bits. It's just one less thing to think about.
Okay. A few things have been upgraded and fixed. I'm compiling a V29 now:
- PTRA/PTRB are now 32-bit (they were 20-bit)
- The hardware stack is now 32-bit (it was 22-bit)
- {SETQ+}COGINIT data paths are now 32-bit (they were 20-bit)
- C/Z flag reporting for ENCOD/ONES has been fixed (C/Z were swapped)
- ENCOD sets C if S>0 (operation was valid)
- New test and branch instructions: TJO/TJE/TJM/TJV
- SUMxx instructions now report 'correct sign' into C for TJV compatibility
Attached is a simple spin program for the P1 to calculate any CRC.
There are various polynomials, number of bits, lsb/msb first, preset crc initial value, xor final value, send LSB/MSB crc byte first.
But a general purpose CRC is better.
Would some of you please test/modify this program and check it works?
What I would like to do is ask Chip for a single-bit CRC instruction for the P2. IMHO the best format for this would be that the data-bit would be in the C flag. Because we only have P2 instructions available with a single operand [#]D style, I thought that the polynomial could be written to the ACCA (perhaps or ACCB?) and that D would point to the CRC register in cog memory.
This is the CRC calculation in spin for a byte...
d := DATA & $FF
repeat i from 0 to 7
c := (d ^ crc) & $01 ' data bit 0 XOR crc bit 0
d := d >> 1 ' data >> 1
crc := crc >> 1 ' crc >> 1
if c
crc := crc ^ poly ' if c==1: crc xor poly
Postedit 29May2015
This is the corrected code for a single bit (msb first)...
poly=$8005 and initially crc16=0
c := (d ^ (crc16>>15)) & $0001 ' data-bit xor crc16[15]
crc16 := (crc16 << 1) & $FFFF ' crc16 << 1
if c ' c==1?
crc16 := crc16 ^ poly ' c==1: crc xor poly
This is a possible P2 CRC bit accumulate instruction format... CRCBIT D where D = CRC Register, C = current data bit, ACCA = polynomial
The CRCBIT instruction performs the following...
(1) X := C XOR D[0]
(2) D := D >> 1
(3) if X == 1 then D := D XOR ACCA
The idea is that for bit-banging, the CRCBIT instruction would be called for each bit sent/received, and the bit would already be in C.
I expect CRCBIT should be capable of being a 1 clock instruction.
So, to accumulate an 8 bit byte (disregarding any reversals and initialisation) the following could be used...
This would take 2+16 clocks per byte, or for 4 bytes in a passed long 2+64 clocks.
Now that the stack has been widened to 32 bits, I expect more data to be put in it. Is 8 levels enough? I know it's not meant to be a general-purpose stack, but it does have the advantage of being a "standard location" for passing parameters and return values between third-party code.
As the stack is now 32-bit would it be better for C and Z to be bits 31 and 30 for CALLs and POPs?
The CALLD instructions use the same bit locations to store the flags, too. If they all changed, it could pave the way for future program counter expansion.
Now that the stack has been widened to 32 bits, I expect more data to be put in it. Is 8 levels enough? I know it's not meant to be a general-purpose stack, but it does have the advantage of being a "standard location" for passing parameters and return values between third-party code.
The problem is that the stack is currently built from flipflops. To get any real size increase, we would have to go to a RAM. That might be a big change with OnSemi. I think we would have to buffer the top of the stack with 32 flipflops.
Comments
I've had some low-level anxiety for a week over the flakiness that Peter was experiencing. I really hope we'll discover any lurking problems. It would be good to know what caused his troubles. And I don't understand why that SD card mux caused things to lock up. Very strange, but now that I've hard-coded the SD card pins into P61..P58, things seems to be okay.
Next to get SD working
The Z result in the new "ONES" instruction doesn't seem to reflect (result=0).
Okay. I forgot what it does, exactly, and I won't be back till later, but I'll look at it tonight.
Do you want a couple of new instructions?
If so, there were a couple you had intended to implement way back ...
1. A basic instruction for helping calculate CRC
2. An instruction to help accumulate NRZI serial eg USB. However, we have smart pins that can do this, so it would just provide some additional bit bang help.
1) I think this would be very useful for USB intra-packet calculations. What does it look like, though?
Thanks. I had C and Z swapped on ENCOD and ONES. Fixed now.
I worked out that it was fairly simple to have a CRC bit calculator, where the new bit was in the C or Z flag. This way the CRC version (CRC16, CCITT, etc) could be "programmed" in software. It could operate in its own "adjacent" cog pair to get parallel throughout.
Meanwhile, I am also working on the minimal SD boot code and v28.
BTW I am really pleased with the widening of the various registers to 32 bits. It's just one less thing to think about.
- PTRA/PTRB are now 32-bit (they were 20-bit)
- The hardware stack is now 32-bit (it was 22-bit)
- {SETQ+}COGINIT data paths are now 32-bit (they were 20-bit)
- C/Z flag reporting for ENCOD/ONES has been fixed (C/Z were swapped)
- ENCOD sets C if S>0 (operation was valid)
- New test and branch instructions: TJO/TJE/TJM/TJV
- SUMxx instructions now report 'correct sign' into C for TJV compatibility
I feel like this is getting pretty 'final' now.
Thanks, all you guys, for your input.
Yes, having those data paths necked down was akin to having too tight a collar on a shirt.
forums.parallax.com/discussion/151992/crc-generation/p1
Attached is a simple spin program for the P1 to calculate any CRC.
There are various polynomials, number of bits, lsb/msb first, preset crc initial value, xor final value, send LSB/MSB crc byte first.
But a general purpose CRC is better.
Would some of you please test/modify this program and check it works?
What I would like to do is ask Chip for a single-bit CRC instruction for the P2. IMHO the best format for this would be that the data-bit would be in the C flag. Because we only have P2 instructions available with a single operand [#]D style, I thought that the polynomial could be written to the ACCA (perhaps or ACCB?) and that D would point to the CRC register in cog memory.
This is the CRC calculation in spin for a byte...
Postedit 29May2015
This is the corrected code for a single bit (msb first)...
This is a possible P2 CRC bit accumulate instruction format...
CRCBIT D
where D = CRC Register, C = current data bit, ACCA = polynomial
The CRCBIT instruction performs the following...
(1) X := C XOR D[0]
(2) D := D >> 1
(3) if X == 1 then D := D XOR ACCA
The idea is that for bit-banging, the CRCBIT instruction would be called for each bit sent/received, and the bit would already be in C.
I expect CRCBIT should be capable of being a 1 clock instruction.
So, to accumulate an 8 bit byte (disregarding any reversals and initialisation) the following could be used...
This would take 2+16 clocks per byte, or for 4 bytes in a passed long 2+64 clocks. This method does not require the CRCBIT instruction to know the number of bits in the algorithm.
Attachment not found.
There is further info here...
forums.parallax.com/discussion/151821/usb-helper-instruction-p2-possible-additional-instructions
We need to be able to put whole bytes into it, right?
At the moment, I only have 8-cog/64-smart-pin FPGA files for the BeMicro-A9 and the Prop123-A9.
I'll add more images tomorrow.
A REP loop could perform a shift instruction followed by the CRCBIT instruction.
My thought was to make the instruction as generic as possible to allow for all the various CRC incarnations.
They seem to be the only ones being used currently. You might wait to see if anyone wants any other files - save your time.
Both A9 V29 images loaded and running Ok. Thanks!
BTW Regarding images for the other FPGA boards, I have multiple test setups and use DE2-115 and DE0-Nano as well.
The CALLD instructions use the same bit locations to store the flags, too. If they all changed, it could pave the way for future program counter expansion.
The problem is that the stack is currently built from flipflops. To get any real size increase, we would have to go to a RAM. That might be a big change with OnSemi. I think we would have to buffer the top of the stack with 32 flipflops.
But how wide must the rotator be? 16 bits with options for 8 and 5 bits?
These changes are just minor refinements, not big things that would require deep re-thinking.