Shop Learn
Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i - Page 111 — Parallax Forums

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

1108109111113114160

Comments

  • AribaAriba Posts: 2,551
    edited 2017-12-16 08:59
    Cluso99 wrote: »

    The reasoning behind requesting a single bit CRC instruction is the possibility of accumulating the CRC as each bit of the byte/block is transmitted/received.

    Often there is insufficient time at the end of a block to perform the CRC calculation for the last/all bytes and get a reply out (ack or nak) in the required time. This is the case with USB and P1.

    When I asked for this instruction 4+ years ago, I had spent a lot of time understanding the USB protocol. I previously spent many years designing hardware and writing synchronous communications in the 80's and 90's.

    Please, just leave it be a single bit CRC instruction. A lookup table can be used if you want Byte CRC calculation.

    But it's the Smartpin logic that receives the single bits, do you even have access to them?

    At the end of a received packet there are 2 CRC bytes, which are not part of the CRC calculation for the packetdata, so you should have enough time to calculate the last byte of the data while receiving the two CRC bytes. Then it's just a compare to decide if the CRC is correct and you have to send an ACK or a NAK.

    At Transmit you can calculate the CRC one byte ahead, while sending. So you have the CRC value ready when you reach the end of the data.

    The case on P2 is totally different from P1, where you have to do every bit with bitbanging at he right time.

    Andy

  • TorTor Posts: 2,010
    I don't know if it matters for the bit/byte crc discussion, but there are 12-bit CRCs out there, and the one I worked with couldn't be implemented with a table.
  • cgraceycgracey Posts: 13,631
    edited 2017-12-16 10:19
    Tor wrote: »
    I don't know if it matters for the bit/byte crc discussion, but there are 12-bit CRCs out there, and the one I worked with couldn't be implemented with a table.

    Interesting. That would need some hardware, then.

    I'm still thinking about how to approach this.

    Doing 8 bits at once might take too many gates and too much time, but I'm thinking that 4 bits at once might be a good balance. That would accommodate a 12-bit CRC gracefully.

    There could be two 2-clock CRC instructions:
    CRCBIT  crc,poly        'use C as the input
    CRCNIB  crc,poly        'use Q[31:28] as the input, Q shifts left by 4
    

    CRCNIB is shielded from interrupts, as is SETQ. Here is a 12-bit CRC operation:
    SHL     data,#32-12
    SETQ    data
    CRCNIB  crc,poly
    CRCNIB  crc,poly
    CRCNIB  crc,poly
    
    ...and to make a 13-bit CRC operation...
    SHL     data,#32-12   WC
    CRCBIT  crc,poly
    SETQ    data
    CRCNIB  crc,poly
    CRCNIB  crc,poly
    CRCNIB  crc,poly
    
    ...and to make a 5-bit CRC operation...
    SHL     data,#32-4    WC
    CRCBIT  crc,poly
    SETQ    data
    CRCNIB  crc,poly
    
    ...and an 8-bit CRC operation in 8 clocks...
    SHL     data,#32-8
    SETQ    data
    CRCNIB  crc,poly
    CRCNIB  crc,poly
    

    How about that? It might seem a little piecemeal, but it adds no flipflops.
  • I assume these different CRC formats have different shift directions and LSB/MSB taps?
    Can these CRC instructions accommodate these differences?
  • RaymanRayman Posts: 12,320
    Could overflow of the 8 deep stack be made to trigger an interrupt?
  • Rayman wrote: »
    Could overflow of the 8 deep stack be made to trigger an interrupt?
    What are people expecting to use the hardware stack for? I don't think it will be used by either C or Spin since it isn't big enough and doesn't provide the flexibility to setup stack frames. It will certainly be good for temporaries but then I would think the depth of 8 would be sufficient. Also, any code that uses it would have to disable interrupts if the interrupt service routines also use the HW stack.
  • RaymanRayman Posts: 12,320
    I don't use interrupts, at least not yet.

    I do use call and ret though, which uses the stack.

    I think I've gotten to about 5 deep in the stack.
    Maybe I can't get to 8, but it is something that I feel I have to keep in back of my mind...
  • cgraceycgracey Posts: 13,631
    ozpropdev wrote: »
    I assume these different CRC formats have different shift directions and LSB/MSB taps?
    Can these CRC instructions accommodate these differences?

    I think that, at the heart, they are all the same. You may have to pre-reverse your data if it is LSB first.
  • cgracey wrote: »
    Tor wrote: »
    I don't know if it matters for the bit/byte crc discussion, but there are 12-bit CRCs out there, and the one I worked with couldn't be implemented with a table.

    Interesting. That would need some hardware, then.

    I'm still thinking about how to approach this.

    Doing 8 bits at once might take too many gates and too much time, but I'm thinking that 4 bits at once might be a good balance. That would accommodate a 12-bit CRC gracefully.

    There could be two 2-clock CRC instructions:
    CRCBIT  crc,poly        'use C as the input
    CRCNIB  crc,poly        'use Q[31:28] as the input, Q shifts left by 4
    
    ...and to make a 5-bit CRC operation...
    SHL     data,#32-4    WC
    CRCBIT  crc,poly
    SETQ    data
    CRCNIB  crc,poly
    
    ...and an 8-bit CRC operation in 8 clocks...
    SHL     data,#32-8
    SETQ    data
    CRCNIB  crc,poly
    CRCNIB  crc,poly
    

    How about that? It might seem a little piecemeal, but it adds no flipflops.
    I'm still a bit fuzzy on the calculating of CRC5 and CRC16 in byte chunks, since there can be an odd number of bytes (and bits, in the case of token and start-of-frame packets) of data to calculate. There will be a need for additional "house-keeping" code to ensure that the accumulated CRC value stays within the domain of the polymonial, right? Sorry if this is a stupid question, but my math sucks :blush:

    For a time-wise comparison using lookup tables, the CRC16 takes 13 clocks/byte and for CRC5 ~32 clocks for the 11-bit token and start-of-frame data.

    Also, contrary to my earlier statement, USB CRC calcs are done LSb->MSb :blush::blush:

  • Cluso99Cluso99 Posts: 18,018
    The CRCBIT instruction will handle the various CRC versions.

    Remember that there are two mainstream CRC16 in common use...
    The original IBM version and the CCITT version. Then Microcomputer implemented the MNP protocol, but because they missunderstood how the CRC16 worked, they incorrectly implemented it - a bug! But it became yet another standard although it used one of the two CRC16 polynomials.
    The variations are mainly to do with the initial value, MSB or LSB, and if the result is inverted, and which byte (for 16 bit crc) comes first. All these variations are covered by selecting the polynomial, the initial value, reversing the result, and inverting the result. So you can calculate these CRCs by the one method.
  • On the Prop1, if you have a way to get each bit into a flag already, you can compute a CRC with only 2 additional instructions per bit. For example, if you have the bit in question stored in Z you can do
              shr crc, #1 wc
    IF_C_NE_Z xor crc, crc_poly
    
    Jonathan
  • cgraceycgracey Posts: 13,631
    "garryj wrote:
    Also, contrary to my earlier statement, USB CRC calcs are done LSb->MSb :blush::blush:

    That wouldn't be a problem. Just use a REV instead of a SHL.
  • cgraceycgracey Posts: 13,631
    lonesock wrote: »
    On the Prop1, if you have a way to get each bit into a flag already, you can compute a CRC with only 2 additional instructions per bit. For example, if you have the bit in question stored in Z you can do
              shr crc, #1 wc
    IF_C_NE_Z xor crc, crc_poly
    
    Jonathan

    That's a nice way to do it. Almost makes a CRC instruction look silly.
  • Cluso99Cluso99 Posts: 18,018
    cgracey wrote: »
    lonesock wrote: »
    On the Prop1, if you have a way to get each bit into a flag already, you can compute a CRC with only 2 additional instructions per bit. For example, if you have the bit in question stored in Z you can do
              shr crc, #1 wc
    IF_C_NE_Z xor crc, crc_poly
    
    Jonathan

    That's a nice way to do it. Almost makes a CRC instruction look silly.
    It's only silly if you have the time for extra instruction. That is my point.
    However, it's the same deal with those other TJx and DJx instructions.

    I have a similar NRZI instruction request but I haven't had time to re-verify my request of years ago.

    The combination of the NRZI and CRC is essential for bit-banging FS USB or other similar protocols.

    While we do have USB with SmartPins, it's possible there are other protocols where the specific SmartPins functions won't work. For the little work/silicon involved, it seems prudent to have the flexible options these two instructions would provide. I would not have asked if these two were imperative for the bit-bang approach. This could also cover any shortcomings/bugs in the SmartPins, should we be unlucky to find some.
  • cgraceycgracey Posts: 13,631
    edited 2017-12-17 09:55
    Cluso99,

    I got the CRC worked out. We have CRCBIT (uses C) and CRCNIB (uses Q[31:28], shifts Q left by 4 bits, shields interrupts):
    dat	org
    
    	bmask	dirb,#15
    
    	jmp	#.nibs		'nibs, comment out for bits
    
    
    .bits	rep	#2,#32		'32 bits, 130 clocks
    	shl	b,#1	wc
    	crcbit	crc,poly
    
    	jmp	#.done
    
    
    .nibs	setq	b		'8 nibbles, 18 clocks
    	crcnib	crc,poly
    	crcnib	crc,poly
    	crcnib	crc,poly
    	crcnib	crc,poly
    	crcnib	crc,poly
    	crcnib	crc,poly
    	crcnib	crc,poly
    	crcnib	crc,poly
    
    
    .done	mov	outb,crc	'same result of $E578
    	jmp	#$
    
    b	long	$12345678	'data
    crc	long	$FFFF		'initial crc
    poly	long	$8005 >< 16	'polynomial
    

    It generates correct results!

    You can do bits or nibbles at a time. Nibble operations can be stacked, handling four bits every two clocks.
  • Cluso99Cluso99 Posts: 18,018
    Excellent thanks Chip!
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2017-12-18 13:45
    While testing the new SD driver together with improved Ethernet drivers (W5500 block speeds x6 faster =1MB/s) I suddenly ran into a problem with my PNut compiled kernel itself. It started acting strange and wouldn't load Forth code and it would continue to switch back to binary input mode even though there wasn't anything there telling it to. The exact same version worked well earlier in the day so I tried to track down the bug that made no sense until I remembered the extra nop before the coginits that I had used before but was now disabled. As soon as I enabled the nop the problem went away, and as soon as I disable the problem reappears.

    But the exact same version on the exact same V29 has been working. It really seems to be some marginal timing problem, perhaps only with the A9s but I have had funny problems for no real reason on previous FPGA versions as you know. I will try to pin down exactly what is happening while it is still playing up but I think it has something to do with the cogid I do with every coginit that runs from hubexec so I will look there.

    EDIT: I moved the nop to just before the cogid in hubexec and the problem went away. Here are the sections of code in question:
    dat
    		orgh	0
    
    		org
    
    	        clkset  #$FF                    'switch to 80MHz (if pll, else 50MHz)
    reboot
    '                nop			' seems to need delay after clkset (otherwise next coginit ids incorrectly)
                    coginit #7,#@RESET
                    coginit #6,#@RESET
                    coginit #5,#@RESET
                    coginit #4,#@RESET
                    coginit #3,#@RESET
                    coginit #2,#@RESET	'vgarun (when DACs are made available)
                    coginit #1,#@rxcog
                    coginit #0,#@RESET
    
    org 0
    RESET		call	#INITCOG		' run non-time critical init from hubexec
    		jmp	#doNEXT
    
    
    '***************************************** HUB CODE ***************************
    '
    dat
     		orgh
    
    version		long	vernum,vertime
    vername		byte	"V28 BOOT"
    
    
    INITCOG
    		loc	PTRA,#@IDLE
    		nop				' !!! added the nop here instead and the bug is gone
    		cogid	X  			'only cog 0 uses the serial port by default
    		tjnz	X,#INITSTKS
    
  • cgraceycgracey Posts: 13,631
    edited 2017-12-18 14:58
    Peter,

    I suspect that this bug has something to do with the hub FIFO interface underflowing in some cases.

    I will make a new version which adds a few levels to the FIFO buffer and I will increase the 'full' level. We'll see if this fixes the problem.

    I had made a FIFO simulator for the 16-cog version that revealed how many levels were needed. Perhaps my reduction for 8 cogs was too simplistic. Or, my simulation model was incomplete.
  • cgraceycgracey Posts: 13,631
    Peter, could you please show more context?

    I am interested to know what happens from the prior branch (which resets the FIFO) to the failure point.
  • Being an "old" Prop1-fan and following the progress of your work for years, here a question a bit apart from the ongoing context in this thread:

    Are there any Prop123-A9-boards left for sale?
    (and if yes: how to get one to germany?)

    Would be great !
  • jmgjmg Posts: 14,850
    .... until I remembered the extra nop before the coginits that I had used before but was now disabled. As soon as I enabled the nop the problem went away, and as soon as I disable the problem reappears.
    ...
    EDIT: I moved the nop to just before the cogid in hubexec and the problem went away. Here are the sections of code in question:

    Checking what you are saying here - there are (at least?) two locations where NOP can fix the problem(s), and either one works ?
    Both seem to prefix COGxx opcodes ? One comes after clkset.

    If the issue was only clkset locked/related, should the 2nd NOP have any effect ?
    What is the exact relative timing of those two lines, as in how many sysclks separate them ?
  • All the files I am using are in my P2 dropbox folder but here are the current files. Only taqoz.spin2 is needed though.
  • cgraceycgracey Posts: 13,631
    edited 2017-12-18 22:16
    ikemschn wrote: »
    Being an "old" Prop1-fan and following the progress of your work for years, here a question a bit apart from the ongoing context in this thread:

    Are there any Prop123-A9-boards left for sale?
    (and if yes: how to get one to germany?)

    Would be great !

    We have some in stock. They are part #60065. They are $475.00.

    If you email Chantal at Parallax, she can get the order going: cwoods@parallax.com.

    Welcome aboard!!!
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2017-12-18 22:45
    I have some further information about that startup bug I'm seeing. Pretty much adding a nop almost anywhere in the startup path fixes the problem. Now if I add three nops instead of one the problem comes back.
    INITCOG
    		nop
    		nop
    		nop
    		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
    		cogid	X
    		tjnz	X,#INITSTKS
    

    My console prompt includes a radix symbol which should be # for decimal but this changes to something else as a symptom of the problem.
    TAQOZ#
    may instead startup as:
    TAQOZ%
    for instance, but that is only a symptom.
  • cgraceycgracey Posts: 13,631
    Peter, thanks for the info. This seems to be a hubexec bug.

    In your post above, does the COGID return the wrong value?
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2017-12-18 23:17
    Oh boy, so I tried this code:
    INITCOG
    		cogid	X
    		nop
    		wrbyte	X,#$0F0
    		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
    		cogid	X
    		tjnz	X,#INITSTKS
    
    Then I examine location $F0 which should be 0 for console on cog 0 but it's 2. So I comment out the nop and I get a value of 3. This sure is interesting.......
    edit: I think maybe that test is not that reliable since it relies upon the streamer, and what sequence it is in for that cog and so whatever comes last, so I will try something a little different
  • cgraceycgracey Posts: 13,631
    Oh boy, so I tried this code:
    INITCOG
    		cogid	X
    		nop
    		wrbyte	X,#$0F0
    		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
    		cogid	X
    		tjnz	X,#INITSTKS
    
    Then I examine location $F0 which should be 0 for console on cog 0 but it's 2. So I comment out the nop and I get a value of 3. This sure is interesting.......

    Whoa!!!

    Maybe COGID is being stalled by a hubexec fetch, winding up with a stale ID from the hub. I think this is it.

    Let me recompile...
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2017-12-18 23:35
    I made a mistake maybe in writing the value to the same location in hub since it relied upon the streamer. So I wrote to the cog's RAM instead with this code which produces the radix prompt symptom but indicates that it is cog #0. However I have a feeling that another cog also thinks it is supposed to be the console cog too so I will check further.
    INITCOG
    		nop
    		cogid	X
    		mov	clkdly,X
    		or	clkdly,#$80
    		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
    		tjnz	X,#INITSTKS
    

    Even though the prompt shows $ instead of # it seems it correctly registered as cog#0 (or'd with $80 check pattern) at Tachyon internal register location #28 (clkdly).
    TAQOZ$ #28 COG@ .BYTE 80 ok
    

    The radix or base is stored as "user registers" for each cog in hub RAM but each user register area should be unique. The base for cog#0 is at $800.
    TAQOZ$ base .WORD 0816 ok
    
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2017-12-18 23:49
    Sorry about this, I'm still not sure but I don't think it is getting confused about the cogid. Look at this test which displays a symptom:
    INITCOG
    		cogid	X
    		mov	r1,X
    		add	r1,#$F0
    		wrbyte	r1,R1
    		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
    		tjnz	X,#INITSTKS
    		drvh	#tx_pin			'set tx output high
    		loc	PTRA,#@TERMINAL
    		wrpin	#$7C,#tx_pin		' asynchronous transmit
    		wxpin	##baudval+7,#tx_pin	' baud 8 data
    

    It reports correctly for all cogs not including cog#1 which is a serial cog.
    TAQOZ$ $F0 $10 DUMP 
    00.00F0: F0 00 F2 F3 F4 F5 F6 F7 00 00 00 00 00 00 00 00    ................ ok
    

    But this sure looks funny because they should all remain zeroed.
    TAQOZ$ $F0 $10 ERASE  ok
    TAQOZ$ $F0 $10 DUMP 
    00.00F0: 00 00 F2 F3 F4 F5 F6 F7 00 00 00 00 00 00 00 00    ................ ok
    
    So leave it to me as I will check further and cogid seems to be fine.
  • cgraceycgracey Posts: 13,631
    edited 2017-12-18 23:57
    Peter, you are using the BeMicro-A9 right now, right?
Sign In or Register to comment.