Smartpin mode for clocked parallel data?

ke4pjw · 2021-02-08 03:47

Is there a clocked smartpin mode for clocked parallel data? I need to read a 16 bit parallel address bus on a low to high transition of a clock pin. Is there a smartpin mode to do this? If not, what is the best to accomplish it? The bus runs at 1.8Mhz and I am having trouble keeping up using FlexProp.

Thanks,
Terry

evanh · 2021-02-08 04:11

This is where the streamer comes into its own. But it doesn't provide any external clocking, so you are left with instruction counting in pasm to align the timing of the streamer's data pins with an emulated clock on another pin. A smartpin can be used to do the emulated clock.

Be warned, there is lots of gaotchas to pitfall into. Luckily, when you're the bus master, most synchronous buses don't mind the clock being paused while the software deals with conditional handling of the signals.

EDIT: I've done testing of hyperRAM using bit-bashed for the command/address phase then engage the smartpin and streamer for clocking the data phase at higher rate. The program wasn't so complicated to read this way. I later converted to using smartpin and streamer for everything. It took away some overheads and made the CA phase faster but the headaches to get everything aligned was demanding.

plasticbrain · 2021-02-08 05:59

I'm trying to do a similar thing, basically emulate a parallel eeprom with the P2, which will be connected to another CPU. I did some stuff on the P1 about 8 years ago and have forgotten most of it, and I'm not getting very far with the P2 disappointingly. It seems like a simple task and I managed to get basic functionality with spin2, but it is too slow - I could only get about 500kHz. Can anyone point to an assembly code example with similar functionality... Fast loop >2mHz read 16 bit address from IO port, output 8 bit data from an array (256k rom file), based on the read address? I can see the appropriate assembly functions needed in the various documents, but I'm finding it a daunting task to get the simplest code running from scratch... If not I'll keep staring at it

cgracey · 2021-02-08 06:39

@plasticbrain said:
I'm trying to do a similar thing, basically emulate a parallel eeprom with the P2, which will be connected to another CPU. I did some stuff on the P1 about 8 years ago and have forgotten most of it, and I'm not getting very far with the P2 disappointingly. It seems like a simple task and I managed to get basic functionality with spin2, but it is too slow - I could only get about 500kHz. Can anyone point to an assembly code example with similar functionality... Fast loop >2mHz read 16 bit address from IO port, output 8 bit data from an array, based on the read address? I can see the appropriate assembly functions needed in the various documents, but I'm finding it a daunting task to get the simplest code running from scratch... If not I'll keep staring at it

If you can post your Spin2 code, it would give us an easy job of translating it to assembly language. I'm sure it's just a few lines, right?

evanh · 2021-02-08 06:40

Being the slave where there is externally supplied clock is much tougher, since that requires responding to the master in some very tight timing constraints. A smartpin in sync serial receive mode has the ability to operate with an external clock signal. It has limitations but is likely the best way when the clocks are independant. One limitation being that each sync serial smartpin requires an external pin for its external clock source. The most distant pin is +-3 pins, so each clock pin can support a max of only six data pins. One hurdle will be untangling the 16x32-bit data from the sixteen data smartpins in a timely manner. There is helper instructions, like MERGE and SPLIT for this.

If the external clock can also be the Propeller's system clock by feeding it to the XI pin, then that would be an advantage. That clock source would have to be a stable non-stop clock though. It this case there is the possibility of using the streamer again, blindly running the burst transfers. Relying on the fact that the two chips are in sync on the same clock.

In both cases, the start and end transaction timings will be critical and may be the biggest hurdle for getting any performance.

evanh · 2021-02-08 06:55

Hmm, I'm really not sure how well any of that will work. Doing the whole thing as a bit-bashed state machine will be the simplest. No doubt that's exactly what you've been doing.

One thing that can help with bit-bashing is the event system. It can eliminate spinning on a pin, reducing them to WAITing on pin changes.

cgracey · 2021-02-08 07:13

If we get to respin the chip, I will add a streamer mode which moves data in/out on external clock transitions.

A while back, someone had asked for a mode which would write the 32-bit system clock into the FIFO for WRFAST, in order to time-stamp events on pins. This is useful for things like scanning electron microscopes where you want to correlate electron events with time, In order to know their spaciality or position.

plasticbrain · 2021-02-08 07:58

@cgracey said:
If you can post your Spin2 code, it would give us an easy job of translating it to assembly language. I'm sure it's just a few lines, right?

I want to start with something like this, and add to it later - like rom file selector in a separate cog... etc
Need to handle CE, RW and PGM signals correctly when I get this working
Sorry in advance for the terrible code ... looks like I can't use formatting either

``con { timing }

CLK_FREQ = 200_000_000 ' system freq as a constant
MS_001 = CLK_FREQ / 1_000 ' ticks in 1ms
US_001 = CLK_FREQ / 1_000_000 ' ticks in 1us

_clkfreq = CLK_FREQ ' set system clock

CON { IO }
CONSTANT_NAME = 0

RX1 = 63 { I } ' programming / debug
TX1 = 62 { O }

SF_CS = 61 { O } ' serial flash
SF_SCK = 60 { O }
SF_SDO = 59 { O }
SF_SDI = 58 { I }

SD_SCK = 61 { O } ' sd card
SD_CS = 60 { O }
SD_SDO = 59 { O }
SD_SDI = 58 { I }

LED2 = 57 { O } ' Eval and Edge LEDs
LED1 = 56 { O }

'----------------------------------------------------------------------------------
'A0_15 = 31..16 { Input }
'DataBus = 07..00 { In HiZ / Out }
'READ/WRITE = 08 { Input }
'17th address bit (bit 16) = 09 { Input }

RW = 08 '{ Input } Read/Write
PB6_A16 = 09
A14 = 30 ' Chip ENA = Addrs bit 14 AND 15
A15 = 31

VAR
long addrsBusVal
byte dataBusVal
byte PGMnot
byte OEna
byte CEna

OBJ
'nickname : "object_name"

PUB main() | fn, p_str, p_dat

p_str, p_dat := @RomName, @RomData

REPEAT
'PGMnot:= PINREAD(RW)
'OEn := PINREAD(RW)^01
'CEn := (PINREAD(A14) & PINREAD(A15))^01

if (PINREAD(RW)==0)' AND CEna|1==1 )  'READ - Output data to the data bus

  addrsBusVal:=PINREAD(31..16)                      'Get 16 bit address from Adress bus (reqires 17 bits. PB6_A16 is bit 16 - ignore for now)
  PINT(LED1)                                        'Toggle LED
  PINWRITE(07..00,RomData[addrsBusVal+2])           'write Data bus

else 'WRITE - Read data from data bus and do something

  PINFLOAT(07..00)                                  'Data bus HiZ - read
  dataBusVal:=PINREAD(00..07)
  PINW(LED1,0)                                      'LED off

'WAITMS(100)                                        'wait 1/10th of a second, loop

DAT

RomName byte "romfile.bin", 0
RomData file "romfile.bin"
``

cgracey · 2021-02-08 10:40

Plasticbrain, here is some PASM code that I wrote that should do the job very quickly:

You'll need to add the _clkfreq setting.

evanh · 2021-02-08 10:51

Nice. Might be good for 8 MHz with the prop2 running at 320 MHz.

Yanomani · 2021-02-08 11:34

@cgracey said:
If we get to respin the chip, I will add a streamer mode which moves data in/out on external clock transitions.

A while back, someone had asked for a mode which would write the 32-bit system clock into the FIFO for WRFAST, in order to time-stamp events on pins. This is useful for things like scanning electron microscopes where you want to correlate electron events with time, In order to know their spaciality or position.

IIRC, CT[63:0] is actually crafted within the HUB circuitry, so, if a pair of Cogs could be put in synchrony, that special mode could be controlled by the even-numbered one (like the way even-numbered pins circuitry does take control, in a differential pin-pair situation), and only the appropriate Write-command signals would need to be controlled/routed, since timestamping data would be already present, just within the Hub.

It would make some kind of a HUB-ops-sharing pair possible, with some extra features, like one of the Cogs doing HUB writes, while the other can be either writing or reading to it, but now, timestamped by its peer.

Would also open the possibility of differing bit-widths being used simultaneously, by the involved Cogs, so as to heavily spare Hub-memory usage, at the non-timestamping, data-only carrying channel.

Only thoughts...

evanh · 2021-02-08 12:07

What purpose of pairing? Timestamping doesn't need it. There is already COGATN for syncing two cogs.

plasticbrain · 2021-02-08 12:29

``> @cgracey said:

Plasticbrain, here is some PASM code that I wrote that should do the job very quickly:

Wow thanks for that! I'm going over that code to make sure I understand it in detail. Cheers.

Cluso99 · 2021-02-08 12:55

@evanh said:
What purpose of pairing? Timestamping doesn't need it. There is already COGATN for syncing two cogs.

And there is sharing of LUT code between cog pairs if enabled. May not be relevant here but it's worth a reminder.

BTW I think a SPI style of option would be nicer if the silicon gets an update. A discussion for much later tho.

ManAtWork · 2021-02-08 12:57

@ke4pjw said:
Is there a clocked smartpin mode for clocked parallel data? I need to read a 16 bit parallel address bus on a low to high transition of a clock pin. Is there a smartpin mode to do this? If not, what is the best to accomplish it? The bus runs at 1.8Mhz and I am having trouble keeping up using FlexProp.

1.8MHz is not very fast. If you align the 16 address signals on a 16 pin boundary it would be easy to read all pins in an ISR triuggered by a pin rising edge event. The P2 is really fast responding to interrupt requests. I think it would only require around 10 clocks to enter the ISR, read the pins and return from the ISR. So at 180MHz it's done in 55ns.

cgracey · 2021-02-08 15:12

Using two cogs' streamers in tandem is an interesting idea. A lot can be done in that way with the current silicon, even.

ke4pjw · 2021-02-08 16:33

@plasticbrain said:
I'm trying to do a similar thing, basically emulate a parallel eeprom with the P2, which will be connected to another CPU. I did some stuff on the P1 about 8 years ago and have forgotten most of it, and I'm not getting very far with the P2 disappointingly. It seems like a simple task and I managed to get basic functionality with spin2, but it is too slow - I could only get about 500kHz. Can anyone point to an assembly code example with similar functionality... Fast loop >2mHz read 16 bit address from IO port, output 8 bit data from an array (256k rom file), based on the read address? I can see the appropriate assembly functions needed in the various documents, but I'm finding it a daunting task to get the simplest code running from scratch... If not I'll keep staring at it

I was able to get this to work on the P2. The problem was getting it to do anything else. No time left for a decision tree once an address of interest is observed.

I could only get it to work in fastspin.

I will post code later today.

Sounds like we have similar projects.

https://www.facebook.com/1441831958/videos/10226320447944536/

@cgracey Wow! Looking at your assembly, you stubbed out the core of what I wanted to do. I need to research waitse. I didn't recall seeing it. I felt that the problem I have been experiencing was holding the data bus outputs on the P2 for the equivalent of CE to fall before going back to float low, and the source of that problem being the time required to recognizing valid address data. Thank you for posting this!

msrobots · 2021-02-08 22:05

I might remember wrong, but one of the changes to rev A was adding support for reading/streaming in ADC with a external clock.

And if i remember correctly Chip confirmed that when clearing the ADC bit it would just stream in parallel bits with a external clock. Maybe I did not really get what he was stating, happens quite often, and I had no time yet to confirm. I asked for my ring-buffer concept but am missing the time to work on it, so I never tried.

Mike

plasticbrain · 2021-02-08 22:21

@ke4pjw said:

I could only get it to work in fastspin.
I will post code later today.
Sounds like we have similar projects.
https://www.facebook.com/1441831958/videos/10226320447944536/

Haha yep, had a look at your video, same idea with a different machine! I've joined that group so I can keep up with what you're doing. Looks like you've cracked basic functionality, me on the other hand, not so much. I'm certainly stress testing the P2's ability to tolerate data bus contention though! Hopefully I still have an undamaged P2 chip when I go back to it today.

ke4pjw · 2021-02-09 01:36

@plasticbrain said:
Haha yep, had a look at your video, same idea with a different machine! I've joined that group so I can keep up with what you're doing. Looks like you've cracked basic functionality, me on the other hand, not so much. I'm certainly stress testing the P2's ability to tolerate data bus contention though! Hopefully I still have an undamaged P2 chip when I go back to it today.

I think using interrupts is going to be the way to go. We are going to have to drop down to PASM2 to do it, which is OK. Digging through the docs for setse# . there isn't much there. It is setting the events configuration and appears to be setting it in the D register. I am sure there is shorthand for experienced assembly folks in there, I just can't make sense of it yet. I suspect that is where the irq gets configured and the waitse# is what gets called/executed based on criteria configured by setse#.

I am going to search the forums for a quick tutorial on P2 interrupts.

ke4pjw · 2021-02-09 01:51

Wait, events aren't interrupts. Huh. What are events?

AJL · 2021-02-09 02:33

Events are bigger than just interrupts. Page 44 of the P2 documentation gives a lot of information, although it could be fleshed out further.

Selectable events are only 4 of the 15 possible interrupt sources. Page 49 of the P2 documentation gives the full list of interrupt sources.
Interrupts suffer latency in responding that is dependent on what code is executing at present as some operations inherently stall interrupts. The Interrupt is set up by setting the relevant interrupt jump vector location and executing a SETINT.

WAITSE# is for waiting for the Selectable event flags (instead of interrupts) and allows essentially instant response to the event, without any interrupt latency, but stalls all other processing.

POLLSE#, as the name suggests, allows polling of the event flags. This gives periodic response windows without interrupt latency and allowing other processing in the cog. The gap between polling events can be controlled by the programmer adding extra polling events in longer loops.

JSE# and JNSE#, cause branches when the event has (or hasn't) occurred.

For a situation where an output bus must be released by a known time after an external event, you could
1. Wait for the external event trigger (WAITSE 1, 2, 3, or 4 or WAITINT 1, 2, or 3 if a pattern match is required)
2. Calculate and set the bus clearance timer (ADDCT 1, 2, or 3) (deterministic)
3. Calculate the response value (possibly non-deterministic)
4. Drive the response to the bus
5. Wait for time-out (WAITCT 1, 2, or 3) and clear bus (deterministic bus clearing)

potatohead · 2021-02-09 02:36

Interrupts triggered by events!

They are all local to the cog too. So you can have multiple cogs responding to different, or the same event.

evanh · 2021-02-09 04:32

Smartpin mode for clocked parallel data?

Comments