@TonyB_ said:
Does this mean no cog access at all to hub RAM when streaming longs at sysclk/2?
Correct. The cog locks up when I try. EDIT: Bear in mind it's actually sysclock/1 then. The naming I use in that program is carried over from earlier where it was all based on shortwords.
@TonyB_ said:
Does this mean no cog access at all to hub RAM when streaming longs at sysclk/2?
Correct. The cog locks up when I try. EDIT: Bear in mind it's actually sysclock/1 then. The naming I use in that program is carried over from earlier where it was all based on shortwords.
Assuming streamer is running at sysclk/D, what is value of D for each of these three?
BYTE440000000767207672876720767287672076728767207672876720767287672076728
SHORT 440000000873848738487384873848738487384873848738487384873848738487384LONG440000000873848738487384873848738487384873848738487384873848738487384
Streamer NCO divider is stated there: $4000_0000, which is divide by 2. So hubRAM effective is 8, 4 and 2 respectively. But that's actually something that is a question mark right now. Those three numbers are based on an assumption that the FIFO decouples hubRAM accesses from the streamer cycles.
I think that's correct but measured behaviour isn't exactly reassuring right now.
It's kind of irrelevant. The NCO divider is plainly listed in third column. The second column is just an index for the NCO divider. ie: QFRAC #1, index
Evan, your results are difficult to interpret. Have a look at the following table. Byte 2 is same as Word 4 and Long 4. This is wrong, it should be Long 8. All x are wrong for Long x.
Total smartpins = 641111111111111111111111111111111111111111111111111111111111111111Rev B silicon. Sysclock 120.0000 MHz
Block Length is
Those are actual measurements. They are what they are.
The BYTE label means the streamer is in RFBYTE mode. As such, for a given streamer NCO rate, when compared to RFLONG, it only needs 1/4 of the bandwidth from hubRAM. Hence RFBYTE is effective /4 compared to RFLONG.
We can certainly ponder as to why some results stand out:
It's to be expected that, for BYTE mode, the block-fill will complete faster than for SHORT or LONG mode, since BYTE mode is using the least bandwidth and will make the least bursts. And indeed it is quicker fill then: 70784 ticks on the BYTE line vs 76936 ticks on the LONG line. Unimpeded being 65536 ticks.
More interesting is that there is no difference between SHORT and LONG. Both take 76936 ticks.
And of course, the eye-popping excursions out to 589760 ticks. Something is going off the rails with those.
We need to consider the worst case address sequence possible to understand the difference between 1.5N and 9N clock cases for the same NCO divisor and element size.
It seems to me that somehow the address requests from the two sources might be cycling through a pattern that misses a hub window regularly and wastes slots, but is there a real address pattern that can be issued by the FIFO and the COG's sequential burst transfer that would cause this and make sense. Sysclk divided by 9 seems especially bad, is this because it is 1+egg beater period (or 1+8)?
Those are likely expected examples of beat patterns upping the interference. So nothing to worry about there I don't think.
Although, still the question of why none of these are afflicting the BYTE lines of measurements. Might be related to the more extreme cases (which also don't affect the BYTE lines).
I think the only way to explain how LONG measurements come out the same as SHORT measurements has to be because each FIFO burst is double length. Namely 12 longwords at a time.
And impressively this somehow doesn't incur any extra stalls on the block writes. Which is explainable by the burst being 1.5 hubRAM rotations and the remaining 0.5 is perfect to not trigger a second stall in that burst.
@evanh said:
I think the only way to explain how LONG measurements come out the same as WORD measurements has to be because each FIFO burst is double length.
There is no explanation because LONG and WORD measurements are not the same for same sysclk divisor.
Well, Roger,
In my idleness I've now rearranged my diagnostic code from residing in lutRAM to now residing in cogRAM ... then added lutRAM prefilling and hubRAM verifying of the block copy ... and results are all good. Not a single failed check on the content written to hubRAM.
New column on the end of each line showing the number of longword match fails between lutRAM and hubRAM. Should always be zero.
Block Length is 65536BYTE280000000873848739287392873928739287384873848738487384873920
SHORT 2800000001310881310881310881310801310801310961310961310881310881310880BYTE355555556786407864898304786409829698304786409829678640786480
SHORT 35555555658977698320589792983045898169831258980898312589776983200LONG3555555561966241966401966161966321966321966241966241966241966241966400BYTE440000000767287672876720767207672076720767207672076728767280
SHORT 440000000873848738487392873928739287392873848738487384873840LONG4400000001310881310881310881310641310801310721310961310721310881310880BYTE533333334819208192081920819128191281904819128192081920819200
SHORT 53333333410922412604010922481920109224126024109216819201092241260400LONG5333333342184482184402184402184322184482184562184722184482184482184400BYTE6 2aaaaaaa 748967489674904748967489674896748967489674896748960
SHORT 6 2aaaaaaa 982967864078648983047864098304983047864098296786400LONG6 2aaaaaaa 98312589776983205898089830458982498304589776983125897760BYTE724924924724327244072432724407243272432724327244072432724400
SHORT 724924924917529174476456917447645691744917449175291752917440LONG7249249241720321720241720321720321720081720321720161720401720321720240BYTE820000000714967148871496714887148871488714967148871496714880
SHORT 820000000767287672876728767287672076720767207672076728767280LONG820000000873848738487384873848739287392873928739287384873840BYTE9 1c71c71c 707847077670776707767078470776707847077670784707760
SHORT 9 1c71c71c 5897205897925897925897927692876928769365896485897205897920LONG9 1c71c71c 1180001179601179681179761179361179681179761179921180001179600
Update: Fix bug where it was only verifying hubRAM after last run of each line.
Update2: Fix bug with not testing longword streamer mode. Had doubled up on shortword! Doh!
Ooooops! All this time and I hadn't double checked the streamer modes in the tests. The reason why SHORT and LONG tests were the same is because I'd duplicated the shortword mode then not modified it for longword streaming.
@evanh said:
Well, Roger,
In my idleness I've now rearranged my diagnostic code from residing in lutRAM to now residing in cogRAM ... then added lutRAM prefilling and hubRAM verifying of the block copy ... and results are all good. Not a single failed check on the content written to hubRAM.
Sounds good. It would be rather bad if there was some type of HW bug here where some request came in on a particular clock cycle with certain FIFO+SETQ burst load conditions that messed up the transfer somehow or somehow messed up the FIFO occupancy etc. It's good to rule it out.
New column on the end of each line showing the number of longword match fails between lutRAM and hubRAM. Should always be zero.
For some reason the last column pasted above has some non-zeroes, while the second last column is zero. I guess you meant the second last column is the new data check result.
EDIT: LOL, looks like you just fixed it in an edited post..
Right, I'm satisfied everything is solved except why the FIFO would ever burst less than six longwords at once.
EDIT: Here's an example set of calculations for the #5 divider index based on Tony's work but reoriented toward burst length discovery. It shows the why the poor performance of this divider on many occasions.
Comments
Correct. The cog locks up when I try. EDIT: Bear in mind it's actually sysclock/1 then. The naming I use in that program is carried over from earlier where it was all based on shortwords.
Assuming streamer is running at sysclk/D, what is value of D for each of these three?
BYTE 4 40000000 76720 76728 76720 76728 76720 76728 76720 76728 76720 76728 76720 76728 SHORT 4 40000000 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 LONG 4 40000000 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384
Streamer NCO divider is stated there: $4000_0000, which is divide by 2. So hubRAM effective is 8, 4 and 2 respectively. But that's actually something that is a question mark right now. Those three numbers are based on an assumption that the FIFO decouples hubRAM accesses from the streamer cycles.
I think that's correct but measured behaviour isn't exactly reassuring right now.
I thinks it's 2, 4 and 8.
Words take same time as longs at half word frequency.
Huh? The respective order is byte, short, long. Those values are effective dividers. Bytes have the highest effective divider (lowest rate).
It's kind of irrelevant. The NCO divider is plainly listed in third column. The second column is just an index for the NCO divider. ie:
QFRAC #1, index
Evan, your results are difficult to interpret. Have a look at the following table. Byte 2 is same as Word 4 and Long 4. This is wrong, it should be Long 8. All x are wrong for Long x.
Total smartpins = 64 1111111111111111111111111111111111111111111111111111111111111111 Rev B silicon. Sysclock 120.0000 MHz Block Length is 65536 BYTE 2 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 12 WORD 2 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 12 BYTE 3 98304 78640 98288 78640 78640 98312 78648 98312 78640 78640 98288 78640 5 WORD 3 589784 589824 98304 98312 98304 98304 589792 98304 589808 589824 98304 98312 12 LONG 3 589808 589824 98304 98312 98304 98304 589792 98304 589808 589824 98304 98312 12 BYTE 4 76720 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 WORD 4 87376 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 12 LONG 4 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 12 BYTE 5 81904 81920 81912 81920 81912 81928 81920 81928 81920 81920 81912 81920 12 WORD 5 126032 126024 81920 109224 126024 81920 109216 81920 109232 126024 81920 109224 12 LONG 5 109232 126024 81920 109224 126024 81920 109216 81920 109232 126024 81920 109224 12 BYTE 6 74896 74904 74896 74904 74896 74904 74904 74896 74904 74904 74896 74904 WORD 6 98304 78640 78640 98296 78648 98312 98312 78648 98304 98304 78640 98304 7 LONG 6 98304 78640 78640 98296 78648 98312 98312 78648 98304 98304 78640 98304 7 BYTE 7 72432 72432 72440 72440 72440 72440 72432 72432 72440 72432 72440 72440 WORD 7 91752 91744 76456 91760 91752 91752 91752 76464 91752 91744 76456 91760 9 LONG 7 91744 91744 76456 91760 91752 91752 91752 76464 91752 91744 76456 91760 9 BYTE 8 71504 71496 71496 71496 71504 71496 71496 71496 71504 71496 71496 71496 WORD 8 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 LONG 8 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 BYTE 9 70784 70784 70784 70776 70776 70776 70776 70784 70776 70784 70784 70776 WORD 9 589760 76936 589800 589800 76936 76936 589760 76936 76936 76936 589800 589800 6 LONG 9 76936 76936 589800 589800 76936 76936 589760 76936 76936 76936 589800 589800 5 BYTE 10 70224 70216 70216 70216 70224 70216 70216 70224 70216 70224 70216 70216 WORD 10 81912 81920 81920 81920 81920 81920 81920 81920 81912 81920 81920 81920 12 LONG 10 81912 81920 81920 81920 81920 81920 81920 81920 81912 81920 81920 81920 12 BYTE 11 69776 69768 69760 69760 69768 69760 69768 69760 69760 69760 69768 69760 WORD 11 90112 90112 90080 90112 90104 90112 90096 90112 90112 90112 90080 90112 12 LONG 11 90080 90096 90080 90112 90104 90112 90096 90112 90112 90112 90080 90112 12 BYTE 12 69392 69392 69400 69392 69400 69392 69400 69392 69392 69392 69400 69392 WORD 12 74896 74904 74904 74896 74896 74896 74896 74896 74896 74904 74904 74896 LONG 12 74896 74904 74904 74896 74896 74896 74896 74896 74896 74904 74904 74896 BYTE 13 69088 69080 69088 69080 69088 69080 69080 69080 69088 69080 69088 69080 WORD 13 73032 73032 73032 77448 73024 77456 73032 77448 73024 77456 73024 77448 LONG 13 73032 73032 73032 77448 73024 77456 73032 77448 73024 77456 73024 77448 BYTE 14 68816 68816 68816 68816 68816 68808 68808 68816 68808 68816 68816 68816 WORD 14 72432 72432 72440 72440 72440 72440 72440 72440 72440 72432 72440 72440 LONG 14 72440 72432 72440 72440 72440 72440 72440 72440 72440 72432 72440 72440 BYTE 15 68584 68584 68584 68584 68592 68584 68584 68584 68592 68584 68584 68584 WORD 15 75616 75616 75616 75616 75624 75624 75616 75616 75616 75616 75616 75616 LONG 15 75616 75616 75616 75616 75624 75624 75616 75616 75616 75616 75616 75616 BYTE 16 68384 68384 68384 68384 68392 68384 68392 68384 68392 68384 68384 68384 WORD 16 71496 71504 71496 71496 71504 71496 71496 71504 71496 71504 71496 71496 LONG 16 71496 71504 71496 71496 71504 71496 71496 71504 71496 71504 71496 71496 BYTE 17 68208 68208 68208 68216 68208 68216 68208 68216 68208 68208 68208 68216 WORD 17 123776 123784 123736 123752 123760 71112 71112 71112 123784 123784 123736 123752 9 LONG 17 123768 123784 123736 123752 123760 71112 71112 71112 123784 123784 123736 123752 9 BYTE 18 68064 68064 68056 68056 68064 68064 68056 68056 68056 68056 68064 68064 WORD 18 70776 70776 70784 70776 70776 70784 70776 70784 70776 70776 70784 70776 LONG 18 70776 70776 70784 70776 70776 70784 70776 70784 70776 70776 70784 70776 BYTE 19 67920 67920 67928 67920 67920 67920 67920 67920 67928 67920 67928 67920 WORD 19 77816 77816 77816 77824 77808 77816 77816 77808 77808 77808 77816 77824 LONG 19 77816 77816 77816 77824 77808 77816 77816 77808 77808 77808 77816 77824 BYTE 20 67800 67792 67792 67792 67792 67800 67792 67800 67792 67800 67792 67792 WORD 20 70216 70216 70224 70224 70216 70216 70216 70224 70224 70216 70224 70224 LONG 20 70216 70216 70224 70224 70216 70216 70216 70224 70224 70216 70224 70224 BYTE 21 67688 67680 67688 67680 67688 67680 67680 67680 67688 67680 67688 67680 WORD 21 76456 76456 69984 76448 69976 76456 69976 69984 69984 69984 69984 76448 LONG 21 69976 69976 69984 76448 69976 76456 69976 69984 69984 69984 69984 76448 BYTE 22 67592 67592 67584 67592 67584 67584 67584 67592 67584 67584 67584 67592 WORD 22 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 LONG 22 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 BYTE 23 67488 67496 67496 67496 67488 67496 67488 67496 67488 67496 67496 67496 WORD 23 71776 71776 71776 71768 71776 71776 71784 71776 71776 71776 71776 71768 LONG 23 71776 71776 71776 71768 71776 71776 71784 71776 71776 71776 71776 71768 BYTE 24 67408 67408 67408 67416 67408 67416 67408 67416 67408 67408 67408 67416 WORD 24 69392 69400 69392 69400 69392 69392 69392 69400 69392 69400 69392 69400 LONG 24 69392 69400 69392 69400 69392 69392 69392 69400 69392 69400 69392 69400 BYTE 25 67336 67328 67336 67328 67336 67328 67328 67328 67336 67328 67336 67328 WORD 25 69224 69224 69224 96352 96368 96376 96336 96376 69224 69224 69224 96352 6 LONG 25 69224 69224 69224 96352 96368 96376 96336 96376 69224 69224 69224 96352 6
Here is the corrected table, which makes two points:
Total smartpins = 64 1111111111111111111111111111111111111111111111111111111111111111 Rev B silicon. Sysclock 120.0000 MHz Fast Block Move Length = 65536 longs Streamer frequency = sysclk/D D BYTE 1 WORD 2 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 12 LONG 4 BYTE - WORD 3 589784 589824 98304 98312 98304 98304 589792 98304 589808 589824 98304 98312 12 LONG 6 589808 589824 98304 98312 98304 98304 589792 98304 589808 589824 98304 98312 12 BYTE 2 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 12 WORD 4 87376 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 12 LONG 8 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 87384 12 BYTE - WORD 5 126032 126024 81920 109224 126024 81920 109216 81920 109232 126024 81920 109224 12 LONG 10 109232 126024 81920 109224 126024 81920 109216 81920 109232 126024 81920 109224 12 BYTE 3 98304 78640 98288 78640 78640 98312 78648 98312 78640 78640 98288 78640 5 WORD 6 98304 78640 78640 98296 78648 98312 98312 78648 98304 98304 78640 98304 7 LONG 12 98304 78640 78640 98296 78648 98312 98312 78648 98304 98304 78640 98304 7 BYTE - WORD 7 91752 91744 76456 91760 91752 91752 91752 76464 91752 91744 76456 91760 9 LONG 14 91744 91744 76456 91760 91752 91752 91752 76464 91752 91744 76456 91760 9 BYTE 4 76720 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 WORD 8 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 LONG 16 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 76728 BYTE - WORD 9 589760 76936 589800 589800 76936 76936 589760 76936 76936 76936 589800 589800 6 LONG 18 76936 76936 589800 589800 76936 76936 589760 76936 76936 76936 589800 589800 5 BYTE 5 81904 81920 81912 81920 81912 81928 81920 81928 81920 81920 81912 81920 12 WORD 10 81912 81920 81920 81920 81920 81920 81920 81920 81912 81920 81920 81920 12 LONG 20 81912 81920 81920 81920 81920 81920 81920 81920 81912 81920 81920 81920 12 WORD 11 90112 90112 90080 90112 90104 90112 90096 90112 90112 90112 90080 90112 12 LONG 22 90080 90096 90080 90112 90104 90112 90096 90112 90112 90112 90080 90112 12 BYTE 6 74896 74904 74896 74904 74896 74904 74904 74896 74904 74904 74896 74904 WORD 12 74896 74904 74904 74896 74896 74896 74896 74896 74896 74904 74904 74896 LONG 24 74896 74904 74904 74896 74896 74896 74896 74896 74896 74904 74904 74896 BYTE - WORD 13 73032 73032 73032 77448 73024 77456 73032 77448 73024 77456 73024 77448 LONG 26 73032 73032 73032 77448 73024 77456 73032 77448 73024 77456 73024 77448 BYTE 7 72432 72432 72440 72440 72440 72440 72432 72432 72440 72432 72440 72440 WORD 14 72432 72432 72440 72440 72440 72440 72440 72440 72440 72432 72440 72440 LONG 28 72440 72432 72440 72440 72440 72440 72440 72440 72440 72432 72440 72440 BYTE - WORD 15 75616 75616 75616 75616 75624 75624 75616 75616 75616 75616 75616 75616 LONG 30 75616 75616 75616 75616 75624 75624 75616 75616 75616 75616 75616 75616 BYTE 8 71504 71496 71496 71496 71504 71496 71496 71496 71504 71496 71496 71496 WORD 16 71496 71504 71496 71496 71504 71496 71496 71504 71496 71504 71496 71496 LONG 32 71496 71504 71496 71496 71504 71496 71496 71504 71496 71504 71496 71496 BYTE - WORD 17 123776 123784 123736 123752 123760 71112 71112 71112 123784 123784 123736 123752 9 LONG 34 123768 123784 123736 123752 123760 71112 71112 71112 123784 123784 123736 123752 9 BYTE 9 70784 70784 70784 70776 70776 70776 70776 70784 70776 70784 70784 70776 WORD 18 70776 70776 70784 70776 70776 70784 70776 70784 70776 70776 70784 70776 LONG 36 70776 70776 70784 70776 70776 70784 70776 70784 70776 70776 70784 70776 BYTE - WORD 19 77816 77816 77816 77824 77808 77816 77816 77808 77808 77808 77816 77824 LONG 38 77816 77816 77816 77824 77808 77816 77816 77808 77808 77808 77816 77824 BYTE 10 70224 70216 70216 70216 70224 70216 70216 70224 70216 70224 70216 70216 WORD 20 70216 70216 70224 70224 70216 70216 70216 70224 70224 70216 70224 70224 LONG 40 70216 70216 70224 70224 70216 70216 70216 70224 70224 70216 70224 70224 BYTE - WORD 21 76456 76456 69984 76448 69976 76456 69976 69984 69984 69984 69984 76448 LONG 42 69976 69976 69984 76448 69976 76456 69976 69984 69984 69984 69984 76448 BYTE 11 69776 69768 69760 69760 69768 69760 69768 69760 69760 69760 69768 69760 WORD 22 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 LONG 44 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 69768 69760 BYTE - WORD 23 71776 71776 71776 71768 71776 71776 71784 71776 71776 71776 71776 71768 LONG 46 71776 71776 71776 71768 71776 71776 71784 71776 71776 71776 71776 71768 BYTE 12 69392 69392 69400 69392 69400 69392 69400 69392 69392 69392 69400 69392 WORD 24 69392 69400 69392 69400 69392 69392 69392 69400 69392 69400 69392 69400 LONG 48 69392 69400 69392 69400 69392 69392 69392 69400 69392 69400 69392 69400 BYTE - WORD 25 69224 69224 69224 96352 96368 96376 96336 96376 69224 69224 69224 96352 6 LONG 50 69224 69224 69224 96352 96368 96376 96336 96376 69224 69224 69224 96352 6 BYTE 13 69088 69080 69088 69080 69088 69080 69080 69080 69088 69080 69088 69080 WORD 26 LONG 52 BYTE 14 68816 68816 68816 68816 68816 68808 68808 68816 68808 68816 68816 68816 WORD 28 LONG 56 BYTE 15 68584 68584 68584 68584 68592 68584 68584 68584 68592 68584 68584 68584 WORD 30 LONG 60 BYTE 16 68384 68384 68384 68384 68392 68384 68392 68384 68392 68384 68384 68384 WORD 32 LONG 64 BYTE 17 68208 68208 68208 68216 68208 68216 68208 68216 68208 68208 68208 68216 WORD 34 LONG 68 BYTE 18 68064 68064 68056 68056 68064 68064 68056 68056 68056 68056 68064 68064 WORD 36 LONG 72 BYTE 19 67920 67920 67928 67920 67920 67920 67920 67920 67928 67920 67928 67920 WORD 38 LONG 76 BYTE 20 67800 67792 67792 67792 67792 67800 67792 67800 67792 67800 67792 67792 WORD 40 LONG 80 BYTE 21 67688 67680 67688 67680 67688 67680 67680 67680 67688 67680 67688 67680 WORD 42 LONG 84 BYTE 22 67592 67592 67584 67592 67584 67584 67584 67592 67584 67584 67584 67592 WORD 44 LONG 88 BYTE 23 67488 67496 67496 67496 67488 67496 67488 67496 67488 67496 67496 67496 WORD 46 LONG 92 BYTE 24 67408 67408 67408 67416 67408 67416 67408 67416 67408 67408 67408 67416 WORD 48 LONG 96 BYTE 25 67336 67328 67336 67328 67336 67328 67328 67328 67336 67328 67336 67328 WORD 50 LONG 100
Those are actual measurements. They are what they are.
The BYTE label means the streamer is in RFBYTE mode. As such, for a given streamer NCO rate, when compared to RFLONG, it only needs 1/4 of the bandwidth from hubRAM. Hence RFBYTE is effective /4 compared to RFLONG.
We can certainly ponder as to why some results stand out:
BYTE 9 1c71c71c 70784 70784 70784 70776 70776 70776 70776 70784 70776 70784 70784 70776 SHORT 9 1c71c71c 589760 76936 589800 589800 76936 76936 589760 76936 76936 76936 589800 589800 6 LONG 9 1c71c71c 76936 76936 589800 589800 76936 76936 589760 76936 76936 76936 589800 589800 5
It's to be expected that, for BYTE mode, the block-fill will complete faster than for SHORT or LONG mode, since BYTE mode is using the least bandwidth and will make the least bursts. And indeed it is quicker fill then: 70784 ticks on the BYTE line vs 76936 ticks on the LONG line. Unimpeded being 65536 ticks.
More interesting is that there is no difference between SHORT and LONG. Both take 76936 ticks.
And of course, the eye-popping excursions out to 589760 ticks. Something is going off the rails with those.
We need to consider the worst case address sequence possible to understand the difference between 1.5N and 9N clock cases for the same NCO divisor and element size.
It seems to me that somehow the address requests from the two sources might be cycling through a pattern that misses a hub window regularly and wastes slots, but is there a real address pattern that can be issued by the FIFO and the COG's sequential burst transfer that would cause this and make sense. Sysclk divided by 9 seems especially bad, is this because it is 1+egg beater period (or 1+8)?
Using Tony's formula - https://forums.parallax.com/discussion/comment/1535610/#Comment_1535610
Take a rough stab of one longword per FIFO burst and voala: T = N / (1 - CR/DB) => 65536 / (1 - (8 * 1) / (9 * 1)) => 589824
Yeah but why isn't it always this value, how does it vary...must relate to the address start conditions or (hidden?) FIFO state somehow...
That's what I want Chip to look into. It just shouldn't happen.
Attached is a longer run (freshly rerun) up to divider index 511.
Looking through the more primey dividers we do get to see fluctuations in SHORT and LONG measurements.
BYTE 97 02a3a0fe 65984 65992 65984 65992 65984 65992 65984 65992 65984 65992 65984 65992 SHORT 97 02a3a0fe 71416 71424 71392 66448 66448 66448 71400 71408 71416 71424 71392 66448 LONG 97 02a3a0fe 71416 71424 71392 66448 66448 66448 71400 71408 71416 71424 71392 66448
Those are likely expected examples of beat patterns upping the interference. So nothing to worry about there I don't think.
Although, still the question of why none of these are afflicting the BYTE lines of measurements. Might be related to the more extreme cases (which also don't affect the BYTE lines).
I think the only way to explain how LONG measurements come out the same as SHORT measurements has to be because each FIFO burst is double length. Namely 12 longwords at a time.
And impressively this somehow doesn't incur any extra stalls on the block writes. Which is explainable by the burst being 1.5 hubRAM rotations and the remaining 0.5 is perfect to not trigger a second stall in that burst.
There is no explanation because LONG and WORD measurements are not the same for same sysclk divisor.
The fact that they aren't the same is exactly why they should measure differently ... but don't. So an explanation is needed.
Well, Roger,
In my idleness I've now rearranged my diagnostic code from residing in lutRAM to now residing in cogRAM ... then added lutRAM prefilling and hubRAM verifying of the block copy ... and results are all good. Not a single failed check on the content written to hubRAM.
New column on the end of each line showing the number of longword match fails between lutRAM and hubRAM. Should always be zero.
Block Length is 65536 BYTE 2 80000000 87384 87392 87392 87392 87392 87384 87384 87384 87384 87392 0 SHORT 2 80000000 131088 131088 131088 131080 131080 131096 131096 131088 131088 131088 0 BYTE 3 55555556 78640 78648 98304 78640 98296 98304 78640 98296 78640 78648 0 SHORT 3 55555556 589776 98320 589792 98304 589816 98312 589808 98312 589776 98320 0 LONG 3 55555556 196624 196640 196616 196632 196632 196624 196624 196624 196624 196640 0 BYTE 4 40000000 76728 76728 76720 76720 76720 76720 76720 76720 76728 76728 0 SHORT 4 40000000 87384 87384 87392 87392 87392 87392 87384 87384 87384 87384 0 LONG 4 40000000 131088 131088 131088 131064 131080 131072 131096 131072 131088 131088 0 BYTE 5 33333334 81920 81920 81920 81912 81912 81904 81912 81920 81920 81920 0 SHORT 5 33333334 109224 126040 109224 81920 109224 126024 109216 81920 109224 126040 0 LONG 5 33333334 218448 218440 218440 218432 218448 218456 218472 218448 218448 218440 0 BYTE 6 2aaaaaaa 74896 74896 74904 74896 74896 74896 74896 74896 74896 74896 0 SHORT 6 2aaaaaaa 98296 78640 78648 98304 78640 98304 98304 78640 98296 78640 0 LONG 6 2aaaaaaa 98312 589776 98320 589808 98304 589824 98304 589776 98312 589776 0 BYTE 7 24924924 72432 72440 72432 72440 72432 72432 72432 72440 72432 72440 0 SHORT 7 24924924 91752 91744 76456 91744 76456 91744 91744 91752 91752 91744 0 LONG 7 24924924 172032 172024 172032 172032 172008 172032 172016 172040 172032 172024 0 BYTE 8 20000000 71496 71488 71496 71488 71488 71488 71496 71488 71496 71488 0 SHORT 8 20000000 76728 76728 76728 76728 76720 76720 76720 76720 76728 76728 0 LONG 8 20000000 87384 87384 87384 87384 87392 87392 87392 87392 87384 87384 0 BYTE 9 1c71c71c 70784 70776 70776 70776 70784 70776 70784 70776 70784 70776 0 SHORT 9 1c71c71c 589720 589792 589792 589792 76928 76928 76936 589648 589720 589792 0 LONG 9 1c71c71c 118000 117960 117968 117976 117936 117968 117976 117992 118000 117960 0
Update: Fix bug where it was only verifying hubRAM after last run of each line.
Update2: Fix bug with not testing longword streamer mode. Had doubled up on shortword! Doh!
deleted
Evan, please study this:
https://forums.parallax.com/discussion/comment/1535680/#Comment_1535680
Tony,
There's a catch with those equations - The burst size, and interval, is the unknowns we're trying to discover here.
deleted
It looks backwards.
deleted
Ooooops! All this time and I hadn't double checked the streamer modes in the tests. The reason why SHORT and LONG tests were the same is because I'd duplicated the shortword mode then not modified it for longword streaming.
Source code and example report are now re-posted above - https://forums.parallax.com/discussion/comment/1535803/#Comment_1535803
Sounds good. It would be rather bad if there was some type of HW bug here where some request came in on a particular clock cycle with certain FIFO+SETQ burst load conditions that messed up the transfer somehow or somehow messed up the FIFO occupancy etc. It's good to rule it out.
For some reason the last column pasted above has some non-zeroes, while the second last column is zero. I guess you meant the second last column is the new data check result.
EDIT: LOL, looks like you just fixed it in an edited post..
Hooray!
Right, I'm satisfied everything is solved except why the FIFO would ever burst less than six longwords at once.
EDIT: Here's an example set of calculations for the #5 divider index based on Tony's work but reoriented toward burst length discovery. It shows the why the poor performance of this divider on many occasions.
BYTE 5 33333334 81920 81920 81920 81912 81912 81904 81912 81920 81920 81920 0 SHORT 5 33333334 109224 126040 109224 81920 109224 126024 109216 81920 109224 126040 0 LONG 5 33333334 218448 218440 218440 218432 218448 218456 218472 218448 218448 218440 0 C (number of cogs) = 8 D (effective divisor) = 10 (BYTE), 5 (SHORT), 2.5 (LONG) Tu (Unimpeded Ticks) = 65536 Tm (Measured Ticks) = 81920, 126040, 218448 Te (Extra ticks) = Tm - Tu => 81920 - 65536 = 16384, 126040 - 65536 = 60504, 218448 - 65536 = 152912 S (Stalls) = Te / C => 16384 / 8 = 2048 , 60504 / 8 = 7563 , 152912 / 8 = 19114 Lb (FIFO burst length in longwords) = (C / D) * Tm / Te => Lb = (C / D) * Tm / (Tm - Tu) => Lb = C / (D * (1 - Tu/Tm)) Lbb = 8 / (10 * (1 - 65536 / 81920)) = 4 Lbs = 8 / (5 * (1 - 65536 / 81920)) = 8 Lbl = 8 / (2.5 * (1 - 65536 / 218448)) = 4.57 Lbs = 8 / (5 * (1 - 65536 / 109224)) = 4 Lbs = 8 / (5 * (1 - 65536 / 126040)) = 3.333
Good.