Doh! After compiling for Prop1 I'd left off the -2 in the compile options.
EDIT: It WORKS now.
Lol, Wine has a "Wordpad" lookalike that is better at viewing RTF docs than LibreWriter is - Doesn't try to make pages out of it and is much more responsive.
EDIT2: Wow, the memories are back - Entering code in a terminal with line numbers somehow instantly transports me back to school days.
Bug report I've saved a program and can reload and rerun it no problem. But when doing a FILES it lists the new name then continuously spews blank lines, never to return to command entry.
Detail: Blocks free has decreased by one. Here's first bootup message:
I've cut another 16 clocks off the overhead ... but it relies on a Flexspin specific directive. So, not as portable.
It's simple enough, use ASM/ENDASM to move the non-repeating setup code back out to the calling hubRAM. A smaller block copy into cogRAM is then done quicker.
Thanks evanh. The "missing" free block is the one that's holding the program you saved.
Given the overhead of the interpreter, there's a point where speeding up SPI I/O or any few things makes little difference in overall execution speed. In addition, a "high performance" driver for flash memory should have a decent wear leveling algorithm, subdirectories, and good behavior on power failure. Hmmm ... sounds like SD cards and something like FAT16?
@"Mike Green" said:
Thanks evanh. The "missing" free block is the one that's holding the program you saved.
Yep. Just providing what info I have. FILES doesn't like something.
Given the overhead of the interpreter, there's a point where speeding up SPI I/O or any few things makes little difference in overall execution speed. In addition, a "high performance" driver for flash memory should have a decent wear leveling algorithm, subdirectories, and good behavior on power failure. Hmmm ... sounds like SD cards and something like FAT16?
SD cards will do wear levelling at the block level irrespective of filesystem in use. It's definitely not a FAT16/32 feature.
INPUT appears to work properly with more than one variable. FILES also seems to work properly
Yes, SD cards do wear levelling at the block level. Some filesystems like FAT16 provide for subdirectories. Journalling file systems provide one way of protecting from power glitches and the like. I'm just saying that this filesystem is very primitive, but probably appropriate for a little Basic interpreter or an application that may benefit from overlays or data logging.
Some examples of what this FemtoBasic implementation can do:
90 OPEN "testdata",w
100 FOR i=0 TO 255
110 LET p=3.1415926
120 LET a=SIN ((i/256.0) * p)
130 LET b=COS ((i/256.0) * p)
140 WRITE i;", ";a;", ";b
150 NEXT i
160 CLOSE
170 END
This creates (or replaces) a file "testdata" with 256 sets of values separated by commas ... i, the number of the line of values, the sine of (i x pi) / 256, and the cosine of (i x pi) / 256.
This reads the "testdata" file and PRINTs each line of data. The program gets an error after the last line of data is read. The error can be avoided by reading a fixed number of lines when that is known.
100 OPEN "testdata",r
105 FOR i=0 TO 255
110 READ i,a,b
120 PRINT i,a,b
130 NEXT i
140 END
This is for my posterity as anything ... What triggered my attention was a diagram that Chip/Jeff has drawn up in the new draft hardware doc ... I became a little puzzled with the details of why so many sysclock ticks of lag was effective. And it is. But turns out my inline code comments weren't very precise as to why. I had proven to myself that it worked but without full understanding.
After some more testing I've now clarified the relationship between instructions and I/O stages. Turns out the staging latencies introduce an even number of ticks from OUT to TESTP, namely 8 (Without registered I/O). Whereas OUT to IN is another extra tick on top, making 9.
So, not the 13 I had mentioned in the old source code but it's still helpful to use all available time of an SPI clock period. Eg: Without factoring in SPI device response time, just the Prop2 I/O pins themselves need +2 ticks when above 320 MHz sysclock.
Therefore, the stance is use everything available. Which is 8 (min) + 6 = 14 ticks. Anything more and the +6 will step into the next 8-tick SPI clock period. Well, 8+7 is theoretically doable but that obviously doesn't suit the instruction intervals.
Here's the same code but with better commenting:
' Bit-bashed SPI byte receiver, CPHA=1 (SPI clocking modes 1 and 3)
outnot CLK ' I/O lag start
nop ' 2
outnot CLK ' 4 First clock edge now at physical pin
rep @rx_rend, #7 ' 6
outnot CLK ' 8 (CPHA=1) min TESTP rx lag, +1 if using IN reg
rcl sD, #1 ' 10 As well as internal latencies, externals also need to be
outnot CLK ' 12 allowed for when at high frequencies. So use all spare ...
testp DO wc ' 14 +6 to minimum lag (Eight ticks per bit)
rx_rend
My prior testing of smartpins suggests the combined input to output smartpin response is 4 ticks, including a tick for processing. So that'll be one staging register each for input and output routing in the silicon. Quite favourable to the 8 or 9 ticks for a cog.
Some more details:
For just pin output, at low sysclock frequency, an unregistered output pin will transition 3 ticks after OUTx instruction has completed. Under same conditions, the pin will transition 1 tick after a smartpin does the same.
Pin input, at low sysclock frequency, always has one tick added for input settling ahead of the front register. So 1+4 ticks from pin to TESTP, 1+5 to IN. And 1+1 from pin to smartpin.
You'll note that's a total of only 3 ticks for combined smartpin I/O. To get the needed 4, add one for smartpin processing.
PS: OUT being shorter latency than IN is something of an illusion. Those numbers are all referenced to instruction completion. So that skews the numbers. There is three I/O routing stages each way for the cogs.
PPS: Actually, it's really hard to know these internal relationships with instruction execution. My numbers above could easily be out by one so not skewed at all. ie: It could take an extra tick to get out and one less tick to arrive in. Doesn't change the round trip, which is all that really matters.
Comments
Doh! After compiling for Prop1 I'd left off the -2 in the compile options.
EDIT: It WORKS now.
Lol, Wine has a "Wordpad" lookalike that is better at viewing RTF docs than LibreWriter is - Doesn't try to make pages out of it and is much more responsive.
EDIT2: Wow, the memories are back - Entering code in a terminal with line numbers somehow instantly transports me back to school days.
For now, INPUT works only for a single variable with or without a prompt. PRINT and WRITE also seem to have problems with more than one variable.
Bug report I've saved a program and can reload and rerun it no problem. But when doing a FILES it lists the new name then continuously spews blank lines, never to return to command entry.
Detail: Blocks free has decreased by one. Here's first bootup message:
and current message:
I've cut another 16 clocks off the overhead ... but it relies on a Flexspin specific directive. So, not as portable.
It's simple enough, use
ASM/ENDASM
to move the non-repeating setup code back out to the calling hubRAM. A smaller block copy into cogRAM is then done quicker.Thanks evanh. The "missing" free block is the one that's holding the program you saved.
Given the overhead of the interpreter, there's a point where speeding up SPI I/O or any few things makes little difference in overall execution speed. In addition, a "high performance" driver for flash memory should have a decent wear leveling algorithm, subdirectories, and good behavior on power failure. Hmmm ... sounds like SD cards and something like FAT16?
Yep. Just providing what info I have. FILES doesn't like something.
SD cards will do wear levelling at the block level irrespective of filesystem in use. It's definitely not a FAT16/32 feature.
INPUT appears to work properly with more than one variable. FILES also seems to work properly
Yes, SD cards do wear levelling at the block level. Some filesystems like FAT16 provide for subdirectories. Journalling file systems provide one way of protecting from power glitches and the like. I'm just saying that this filesystem is very primitive, but probably appropriate for a little Basic interpreter or an application that may benefit from overlays or data logging.
Oops ... here's the archive.
Solved, the highest 64 kB chunk of the EEPROM mostly contained zeros. Once I erased that area, then FILES behaved.
Here's my diagnostic printout before erasing:
When the first 12 bytes of a 4K block are zeros, FILES thinks that this is the first block in a file with zeros as its name. That'll display as spaces
It certainly did that alright. Never seemed to end though.
Some examples of what this FemtoBasic implementation can do:
90 OPEN "testdata",w
100 FOR i=0 TO 255
110 LET p=3.1415926
120 LET a=SIN ((i/256.0) * p)
130 LET b=COS ((i/256.0) * p)
140 WRITE i;", ";a;", ";b
150 NEXT i
160 CLOSE
170 END
This creates (or replaces) a file "testdata" with 256 sets of values separated by commas ... i, the number of the line of values, the sine of (i x pi) / 256, and the cosine of (i x pi) / 256.
100 OPEN "testdata",r
110 READ i,a,b
120 PRINT i,a,b
130 GOTO 110
This reads the "testdata" file and PRINTs each line of data. The program gets an error after the last line of data is read. The error can be avoided by reading a fixed number of lines when that is known.
100 OPEN "testdata",r
105 FOR i=0 TO 255
110 READ i,a,b
120 PRINT i,a,b
130 NEXT i
140 END
This is for my posterity as anything ... What triggered my attention was a diagram that Chip/Jeff has drawn up in the new draft hardware doc ... I became a little puzzled with the details of why so many sysclock ticks of lag was effective. And it is. But turns out my inline code comments weren't very precise as to why. I had proven to myself that it worked but without full understanding.
After some more testing I've now clarified the relationship between instructions and I/O stages. Turns out the staging latencies introduce an even number of ticks from OUT to TESTP, namely 8 (Without registered I/O). Whereas OUT to IN is another extra tick on top, making 9.
So, not the 13 I had mentioned in the old source code but it's still helpful to use all available time of an SPI clock period. Eg: Without factoring in SPI device response time, just the Prop2 I/O pins themselves need +2 ticks when above 320 MHz sysclock.
Therefore, the stance is use everything available. Which is 8 (min) + 6 = 14 ticks. Anything more and the +6 will step into the next 8-tick SPI clock period. Well, 8+7 is theoretically doable but that obviously doesn't suit the instruction intervals.
Here's the same code but with better commenting:
Smartpins and streamers are different again ...
My prior testing of smartpins suggests the combined input to output smartpin response is 4 ticks, including a tick for processing. So that'll be one staging register each for input and output routing in the silicon. Quite favourable to the 8 or 9 ticks for a cog.
Some more details:
For just pin output, at low sysclock frequency, an unregistered output pin will transition 3 ticks after OUTx instruction has completed. Under same conditions, the pin will transition 1 tick after a smartpin does the same.
Pin input, at low sysclock frequency, always has one tick added for input settling ahead of the front register. So 1+4 ticks from pin to TESTP, 1+5 to IN. And 1+1 from pin to smartpin.
You'll note that's a total of only 3 ticks for combined smartpin I/O. To get the needed 4, add one for smartpin processing.
PS: OUT being shorter latency than IN is something of an illusion. Those numbers are all referenced to instruction completion. So that skews the numbers. There is three I/O routing stages each way for the cogs.
PPS: Actually, it's really hard to know these internal relationships with instruction execution. My numbers above could easily be out by one so not skewed at all. ie: It could take an extra tick to get out and one less tick to arrive in. Doesn't change the round trip, which is all that really matters.