Any way to extract S field of a register to another in a single instruction?

evanh · 2020-07-01 08:12

Maybe move the code for flash writes into hubexec.

AJL · 2020-07-01 08:27

Roger,

Is 'a' always zero on entry? If not that could cause this code to fail in circumstances where (a+count+(addr & $1ff))>512.

Otherwise, a hand trace tells me the code should work.

With addr1 holding 1024 ($400) the alts will pass a zero through to be added to 'a'.

With a count of 2 you should come out of this with 'a' holding zero, 'c' cleared, and 'z' set

What results are you getting?

rogloh · 2020-07-01 08:56

evanh wrote: »

Maybe move the code for flash writes into hubexec.

No way.

After looking through this I think I have some workarounds. Instead of failing on count + (addr1 & $1ff) being greater than 512, I can limit it to 512. It just means people who try to cross a flash boundary will only write up to the next flash line boundary of 512 bytes then succeed (with no error message indicating this happened). We can document the fact that any flash bursts that exceed 512 byte lines will automatically be truncated. Shouldn't be a problem. It's also a good idea to fragment the writes for several other reasons anyway - allows notification of progress on a very large write instead of waiting until the end, and for enabling QoS for video because unlike HyperRAM the HyperFlash writes cannot get fragmented by the driver (doing that would cause programming failure because the special setup sequence required before every flash write is done by the client).

So instead of this:

'..................................................................................................
' HYPER WRITES without latency                            
                                                            ' a b c

                                                            ' W B R  (a) word write
                                                            ' O U E  (b) burst write
                                                            ' R R G  (c) register write
                                                            ' D S 
                                                            '   T
w_flash_burst               tjz     count, #nowrite_lut     '   b     ensure count is non-zero
                            alts    addr1, #$1ff            '   b     get 9 LSBs of flash address-1
                            mov     a, #0-0                 '   b     ...into a temp reg
                            add     a, count                '   b     add the transfer count
                            andn    a, #$1ff wcz            '   b     clear lower bits, Z=1 & C=0 when ok
            if_nz           jmp     #w_abort                '   b     exit if this wraps past 512 byte boundary

I can do this....

w_flash_burst               tjz     count, #nowrite_lut     '   b     ensure count is non-zero
                            mov     a, #0 wcz               '   b     set z, clear c, prepare a reg
                            sets    a, addr1                '   b     get the 9 LSBs of address
                                                            '   b     hidden augs
                            subr    a, ##512                '   b     set a=512-(addr1 & $1ff)
                            fle     count, a                '   b     limit the bytes transferred to not cross a boundary

Alternatively I could have the error retained but then I need to find a spare COGRAM long for the 512 constant which is easier to find than LUT RAM, though I'm still hunting.

w_flash_burst               tjz     count, #nowrite_lut  
                            mov     a, #0 wz  ' z=1
                            sets    a, addr1
                            add     a, count
                            fle     a, ##512 wc 
        if_c                jmp     #unsupported

Update: I found a spare COGRAM long!

rogloh · 2020-07-01 09:00

AJL wrote: »

Roger,

Is 'a' always zero on entry? If not that could cause this code to fail in circumstances where (a+count+(addr & $1ff))>512.

Otherwise, a hand trace tells me the code should work.

With addr1 holding 1024 ($400) the alts will pass a zero through to be added to 'a'.

With a count of 2 you should come out of this with 'a' holding zero, 'c' cleared, and 'z' set

What results are you getting?

Register "a" gets set explicitly in the mov instruction. It is arbitrary on entry. I thought it would work too a month ago when I checked it visually but after using it I found it fails when the addr1 value has a 9 LSB boundary of zeroes (eg. 0, 512, 1024 etc). With these sample numbers "a" comes out at $201 and it then fails the andn test.

AJL · 2020-07-02 06:52

rogloh wrote: »

AJL wrote: »

Roger,

Is 'a' always zero on entry? If not that could cause this code to fail in circumstances where (a+count+(addr & $1ff))>512.

Otherwise, a hand trace tells me the code should work.

With addr1 holding 1024 ($400) the alts will pass a zero through to be added to 'a'.

With a count of 2 you should come out of this with 'a' holding zero, 'c' cleared, and 'z' set

What results are you getting?

Register "a" gets set explicitly in the mov instruction. It is arbitrary on entry. I thought it would work too a month ago when I checked it visually but after using it I found it fails when the addr1 value has a 9 LSB boundary of zeroes (eg. 0, 512, 1024 etc). With these sample numbers "a" comes out at $201 and it then fails the andn test.

I obviously had a vague moment when I looked at this yesterday.

Of course 'a' is overwritten, but for addr1 of$400 the alts doesn't pass a zero, it passes a $1ff, which then fails for a count of 2 or more when 512 is the limit.

Your new code has the added advantage of working the way I think most people would do it by hand.

rogloh · 2020-07-02 07:56

AJL wrote: »

Your new code has the added advantage of working the way I think most people would do it by hand.

Yes I have coded up the first variant where it still allows some flash writes up to the 512 byte flash line buffer boundary then truncates it. This stuff all gets hidden in my SPIN2 API anyway so it shouldn't ever need to occur. I just needed a way to handle situations with direct PASM clients requesting flash burst writes in the mailbox. The caller should still be breaking up its own write requests anyway to avoid hitting this.

The new SPIN2 API I am putting together can do these four programming operations at any flash address and the API will now deal with odd addresses and odd lengths and handle appropriately. Obviously it makes sense to not go and program one item at a time, for both endurance and speed reasons, but these extra methods are in there for convenience and can help "patch" HyperFlash locations if ever required.

PUB programFlash(addr, srcHubAddr, byteCount) : r 
PUB programFlashByte(addr, data) : r
PUB programFlashWord(addr, data) : r 
PUB programFlashLong(addr, data) : r

For bursts using programFlash(), if the start write address and length exceeds the range of flash memory it will only write up to the size of the flash memory and any single byte writes required at the start or end are converted into words with either the upper or lower byte set as $FF. Any PASM client doing flash writes would need to do the same or it will encounter alignment errors. I may add a fourth argument for programFlash to help control the size of each burst for preventing potential real time video corruption. The default right now is just to always use the highest performance 512 bytes but this could be shortened to say 64 or 128 if this maximum write burst size adds too much latency for any video COG. It's unlikely, but in some applications programming flash still may impact the video given we can't fragment it's requests internally in the driver unlike the other RAM banks. Another option is to just make use of the known flash read burst size and to just use that for this write burst size (rounded down to a power of 2 to fit an integer number of write bursts within the 512 byte line size). That way it only needs to be set in one place and is handled automatically.

ersmith · 2020-07-02 15:11

Would something like this work? The idea is to calculate addr + count and see if the upper bits are the same as addr:

	tjz	count, #nowrite_lut
	mov	a, addr
	add	a, count	' a = addr + count
	xor	a, addr		' check which bits changed from addr
	andn	a, #$1ff wz	' ignore changes in low bits
  if_nz	jmp	#sizeerror

Edit: oops, I haven't made sure C = 0. Probably we could clear C in the add, assuming a + count cannot overflow.

rogloh · 2020-07-02 23:04

Very nice, yes that could also work too Eric as I have since rearranged other error handling and figured out a way to avoid the need to even clear C now.

In the end I may simply stick to just the truncation method and document this behaviour. It also protects against sending down massive values of count that could otherwise hang in the transfer loop for a long time. Eg. what if the count was $FFFF_FFFF or something very high so register "a" was still zero after the test. It could try to write over and over to the same flash line which may not be ideal and certainly could leave the flash data line corrupted due to the ANDing of its written bits.

Any way to extract S field of a register to another in a single instruction?

Comments