Getting data into the prop *fast*
OwenS
Posts: 173
Does anyone have any idea how to get data in from a 16-bit wide port to hub ram fast.
And i'm talking 2mbyte/sec or faster.
I was thinking about multiplexing data, address and clock this way
I'll be implementing the other end of this bus in either a CPLD or SX, I haven't decided yet. If I do go with the SX, then the SX will have (At 80MHz) 40 cycles per transfer
I have implemented this in ASM - but I haven't found a way of ensuring that reads/writes always happen in 4 cycle increments.
My code so far is this:
(I haven't yet tried assembling this)
I was wondering if anyone had any suggestions, optimizations or ways to improve this? You can modify the bus however you want - as I said, I haven't implemented the other end yet
(FWIW, the Prop can be clocked at either 80MHz or 100MHz, whichever is easier for the implementation)
Edit: Oh yeah, the prop is providing the clock, but I would like to keep it at 50% duty cycle if I do go SX
And i'm talking 2mbyte/sec or faster.
I was thinking about multiplexing data, address and clock this way
CLK Phase High Low P0 A0 D0 P1 A1 D1 P2 A2 D2 P3 A3 D3 P4 A4 D4 P5 A5 D5 P6 A6 D6 P7 A7 D7 P8 A8 WR / /RD P9 A9 ign P10 A10 ign P11 A11 ign P12 A12 ign P13 A13 ign P14 A14 ign P15 CLK CLK
I'll be implementing the other end of this bus in either a CPLD or SX, I haven't decided yet. If I do go with the SX, then the SX will have (At 80MHz) 40 cycles per transfer
I have implemented this in ASM - but I haven't found a way of ensuring that reads/writes always happen in 4 cycle increments.
My code so far is this:
ORG WAITPNE biuclk MOV dira busr ; 0 NOP ; 1 NOP ; 2 bloop MOV addr, ina ; 3 AND addr, amsk ; 4 AND ina, rwmsk WZ NR ; 5 IF_NZ MOV dira, busw ; 6 IF_Z MOV dira, busr ; 7 IF_NZ WRBYTE ina, addr ; 8 IF_Z RDBYTE outa, addr ; 9 ; 0 (HUBOPS take 8 cycles if done, 4 if not) MOV dira, busr ; 1 JMP bloop ; 2 addr LONG 0 amsk LONG $FFFF0000 rwmask LONG x biuclk LONG x busw LONG $FF000000 busr LONG $00000000
(I haven't yet tried assembling this)
I was wondering if anyone had any suggestions, optimizations or ways to improve this? You can modify the bus however you want - as I said, I haven't implemented the other end yet
(FWIW, the Prop can be clocked at either 80MHz or 100MHz, whichever is easier for the implementation)
Edit: Oh yeah, the prop is providing the clock, but I would like to keep it at 50% duty cycle if I do go SX
Comments
Also, look at where addr is coming from; It's MOVed in on the first half of the cycle
Note that you need the MOV from INA to a temporary variable. When you reference INA in a destination field, you access a "shadow register" in COG RAM, not the INA register.
The RDBYTE / WRBYTE is variable timing on the first loop due to the need for synchronization with the HUB. Once it's synchronized, it should take the same number of cycles. I think here you have 32 clocks apart from the RDBYTE/WRBYTE which has a minimum of 7. That makes 39 and the HUB has a 16 clock cycle. The RDBYTE/WRBYTE will catch it on the 3rd cycle, so you should transfer a byte every 48 clocks. At 80MHz, that's a byte every 600ns or 1.67MB/sec. You also have room in the loop to test for CLK being dropped, then exiting from the loop.
How do you tell when there's an address on the input lines vs a data transfer?
You can speed up the transfers significantly if you have a fixed block length in that
you can "unroll" the loop and you don't have to transfer the address for each byte.
at the top
Though I still need to work out how to get the counter to tick at the same speed as the code can cycle through it
I can't unroll it, since most microprocessors don't access in any way thats not random
This will be 1 us/4 bytes giving you another 1 us = 20 instructions to care for address matters to reach your 2MB/s taget..
Post Edited (deSilva) : 10/20/2007 5:01:43 PM GMT
If the other end is intelligent, which it will be, then it can automatically handle 4 byte transfers on behalf of the host processor - though application code must be made to know that writing less than 4 bytes consecutive of the first byte will be slow - involving the controller on the other end performing a read to fill in the missing data.
I implemented a block based implementation. Interestingly, it takes 100 cycles either way if hub access is worst case scenario of 22 cycles. As I don't report back what the prop is doing, thats the fastest you can go (And reporting back would slow it down anyway)
At 100 cycles, thats 800ktransfers/sec, or 3.2mbytes/sec
I get the impression that the SX would be eminently suitable for the other end of this protocol
Anyway, the code:
- JMP #....
- WAITP... ,#0
I THINK, RCR needs a WC - that was my fault..
(b) You have to find out if the INA data strobes will match. I mean: The XP is putting signals on the bus. Is the code able to catch up with it? You need 13 to 17 clock ticks between the data strobes and a little bit longer up to the next address strobe.
That means the SX should not send data bytes fasted than 200ns!
Which gives a max of 5 MB/s reduced by the address handling overhead. Now, can you guarantee that the Prop is fast enough back at ADDRCY ?
It will likely be substantially easier to interface the SX to the Propeller than the SX to the 6502-style memory bus thats on the other side
Oops! Ive just realised my code never sets addr!
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle