Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

David Betz · 2015-09-26 16:00

Chip,

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

mindrobots · 2015-09-26 16:28

Yes, please! A more portable loader. (I'm not slacking, I'm typing this from my lawn mower!)

potatohead · 2015-09-26 17:50

Chores? Pfffft. That's nothing. I'm outta State on contract, until the 2nd. My FPGA boards miss me and require feeding.

Have fun. I'll be watching with great interest.

David Betz · 2015-09-26 18:53

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

What is the full register set and addresses?

cgracey · 2015-09-26 19:08

Rayman wrote: »

I think I almost understand the blinky example, but some things look magic...

What does "orgh 1" do? Why not just orgh ? Does this code start at $1000 (I think so)?

The last two lines with org and res x are hurting my brain...
Does the compiler load anything after "org" into cog before starting?
Or does this only work for "res" reserved space that doesn't need initializing?

That ORGH 1 is there because that's where the loader jumps into your code. It's that non-aligned hub exec below $1000 that people hate. I just haven't changed it yet. I kind of don't want to, because it allows most efficient use of memory. You could always just put a JMP #$1000 after it and pretend it's not really happening.

That ORG + RES business was just a quick way to get some symbolic cog registers declared. It doesn't generate any code. Each blinking cog will use its own instance of those registers.

cgracey · 2015-09-26 19:14

David Betz wrote: »

Chip,

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.

potatohead · 2015-09-26 19:15

Not everyone hates it...

I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

David Betz · 2015-09-26 19:16

cgracey wrote: »

David Betz wrote: »

Chip,

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.

Okay, so ROM_Booter.spin is the only thing that is fixed. MainLoader.spin is the second-stage loader and could be different for different loader implementations, right?

cgracey · 2015-09-26 19:18

David Betz wrote: »

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

What is the full register set and addresses?

CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction

# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches

David Betz · 2015-09-26 19:20

cgracey wrote: »

David Betz wrote: »

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

What is the full register set and addresses?

CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction

# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches

Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?

cgracey · 2015-09-26 19:21

David Betz wrote: »

cgracey wrote: »

David Betz wrote: »

Chip,

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.

Okay, so ROM_Booter.spin is the only thing that is fixed. MainLoader.spin is the second-stage loader and could be different for different loader implementations, right?

Correct.

cgracey · 2015-09-26 19:26

David Betz wrote: »

cgracey wrote: »

David Betz wrote: »

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

What is the full register set and addresses?

CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction

# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches

Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?

Yes. And any offset expressed is scaled by the word size.

David Betz · 2015-09-26 19:28

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

David Betz wrote: »

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

What is the full register set and addresses?

CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction

# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches

Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?

Yes. And any offset expressed is scaled by the word size.

Ah, okay. So the new P2 instruction set still has all of the PTRx addressing modes of P2-hot? I didn't realize that.

David Betz · 2015-09-26 19:44

Is this all still valid?

PTRx expressions:

    INDEX = -32..+31 for simple offsets, 0..31 for ++'s, or 0..32 for --'s
    SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide

    S = 0 for PTRA, 1 for PTRB
    U = 0 to keep PTRx same, 1 to update PTRx
    P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
    NNNNNN = INDEX
    nnnnnn = -INDEX


    SUPNNNNNN     PTR expression
    -----------------------------------------------------------------------------
    000000000     PTRA              'use PTRA
    100000000     PTRB              'use PTRB
    011000001     PTRA++            'use PTRA,                PTRA += SCALE
    111000001     PTRB++            'use PTRB,                PTRB += SCALE
    011111111     PTRA--            'use PTRA,                PTRA -= SCALE
    111111111     PTRB--            'use PTRB,                PTRB -= SCALE
    010000001     ++PTRA            'use PTRA + SCALE,        PTRA += SCALE
    110000001     ++PTRB            'use PTRB + SCALE,        PTRB += SCALE
    010111111     --PTRA            'use PTRA - SCALE,        PTRA -= SCALE
    110111111     --PTRB            'use PTRB - SCALE,        PTRB -= SCALE

    000NNNNNN     PTRA[INDEX]       'use PTRA + INDEX*SCALE
    100NNNNNN     PTRB[INDEX]       'use PTRB + INDEX*SCALE
    011NNNNNN     PTRA++[INDEX]     'use PTRA,                PTRA += INDEX*SCALE
    111NNNNNN     PTRB++[INDEX]     'use PTRB,                PTRB += INDEX*SCALE
    011nnnnnn     PTRA--[INDEX]     'use PTRA,                PTRA -= INDEX*SCALE
    111nnnnnn     PTRB--[INDEX]     'use PTRB,                PTRB -= INDEX*SCALE
    010NNNNNN     ++PTRA[INDEX]     'use PTRA + INDEX*SCALE,  PTRA += INDEX*SCALE
    110NNNNNN     ++PTRB[INDEX]     'use PTRB + INDEX*SCALE,  PTRB += INDEX*SCALE
    010nnnnnn     --PTRA[INDEX]     'use PTRA - INDEX*SCALE,  PTRA -= INDEX*SCALE
    110nnnnnn     --PTRB[INDEX]     'use PTRB - INDEX*SCALE,  PTRB -= INDEX*SCALE

cgracey · 2015-09-26 20:10

David Betz wrote: »

Is this all still valid?

PTRx expressions:

    INDEX = -32..+31 for simple offsets, 0..31 for ++'s, or 0..32 for --'s
    SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide

    S = 0 for PTRA, 1 for PTRB
    U = 0 to keep PTRx same, 1 to update PTRx
    P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
    NNNNNN = INDEX
    nnnnnn = -INDEX


    SUPNNNNNN     PTR expression
    -----------------------------------------------------------------------------
    000000000     PTRA              'use PTRA
    100000000     PTRB              'use PTRB
    011000001     PTRA++            'use PTRA,                PTRA += SCALE
    111000001     PTRB++            'use PTRB,                PTRB += SCALE
    011111111     PTRA--            'use PTRA,                PTRA -= SCALE
    111111111     PTRB--            'use PTRB,                PTRB -= SCALE
    010000001     ++PTRA            'use PTRA + SCALE,        PTRA += SCALE
    110000001     ++PTRB            'use PTRB + SCALE,        PTRB += SCALE
    010111111     --PTRA            'use PTRA - SCALE,        PTRA -= SCALE
    110111111     --PTRB            'use PTRB - SCALE,        PTRB -= SCALE

    000NNNNNN     PTRA[INDEX]       'use PTRA + INDEX*SCALE
    100NNNNNN     PTRB[INDEX]       'use PTRB + INDEX*SCALE
    011NNNNNN     PTRA++[INDEX]     'use PTRA,                PTRA += INDEX*SCALE
    111NNNNNN     PTRB++[INDEX]     'use PTRB,                PTRB += INDEX*SCALE
    011nnnnnn     PTRA--[INDEX]     'use PTRA,                PTRA -= INDEX*SCALE
    111nnnnnn     PTRB--[INDEX]     'use PTRB,                PTRB -= INDEX*SCALE
    010NNNNNN     ++PTRA[INDEX]     'use PTRA + INDEX*SCALE,  PTRA += INDEX*SCALE
    110NNNNNN     ++PTRB[INDEX]     'use PTRB + INDEX*SCALE,  PTRB += INDEX*SCALE
    010nnnnnn     --PTRA[INDEX]     'use PTRA - INDEX*SCALE,  PTRA -= INDEX*SCALE
    110nnnnnn     --PTRB[INDEX]     'use PTRB - INDEX*SCALE,  PTRB -= INDEX*SCALE

That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.

Roy Eltham · 2015-09-26 21:29

potatohead wrote: »

Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!

Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.

Things should work in straight forward easy to describe ways, not with weird gotcha kluges.

David Betz · 2015-09-26 21:40

Roy Eltham wrote: »

potatohead wrote: »

Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!

Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.

Things should work in straight forward easy to describe ways, not with weird gotcha kluges.

Ummm... Is there any reason that the ROM has to be loaded starting at $0000? Why can't it be loaded starting at $1000 in the first place?

Roy Eltham · 2015-09-26 21:48

David,
That works fine too. Not sure why Chip thinks it needs to be loaded at $0 and make this kluge to use that memory for hubexec.

David Betz · 2015-09-26 22:20

cgracey wrote: »

David Betz wrote: »

Is this all still valid?

PTRx expressions:

    INDEX = -32..+31 for simple offsets, 0..31 for ++'s, or 0..32 for --'s
    SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide

    S = 0 for PTRA, 1 for PTRB
    U = 0 to keep PTRx same, 1 to update PTRx
    P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
    NNNNNN = INDEX
    nnnnnn = -INDEX


    SUPNNNNNN     PTR expression
    -----------------------------------------------------------------------------
    000000000     PTRA              'use PTRA
    100000000     PTRB              'use PTRB
    011000001     PTRA++            'use PTRA,                PTRA += SCALE
    111000001     PTRB++            'use PTRB,                PTRB += SCALE
    011111111     PTRA--            'use PTRA,                PTRA -= SCALE
    111111111     PTRB--            'use PTRB,                PTRB -= SCALE
    010000001     ++PTRA            'use PTRA + SCALE,        PTRA += SCALE
    110000001     ++PTRB            'use PTRB + SCALE,        PTRB += SCALE
    010111111     --PTRA            'use PTRA - SCALE,        PTRA -= SCALE
    110111111     --PTRB            'use PTRB - SCALE,        PTRB -= SCALE

    000NNNNNN     PTRA[INDEX]       'use PTRA + INDEX*SCALE
    100NNNNNN     PTRB[INDEX]       'use PTRB + INDEX*SCALE
    011NNNNNN     PTRA++[INDEX]     'use PTRA,                PTRA += INDEX*SCALE
    111NNNNNN     PTRB++[INDEX]     'use PTRB,                PTRB += INDEX*SCALE
    011nnnnnn     PTRA--[INDEX]     'use PTRA,                PTRA -= INDEX*SCALE
    111nnnnnn     PTRB--[INDEX]     'use PTRB,                PTRB -= INDEX*SCALE
    010NNNNNN     ++PTRA[INDEX]     'use PTRA + INDEX*SCALE,  PTRA += INDEX*SCALE
    110NNNNNN     ++PTRB[INDEX]     'use PTRB + INDEX*SCALE,  PTRB += INDEX*SCALE
    010nnnnnn     --PTRA[INDEX]     'use PTRA - INDEX*SCALE,  PTRA -= INDEX*SCALE
    110nnnnnn     --PTRB[INDEX]     'use PTRB - INDEX*SCALE,  PTRB -= INDEX*SCALE

That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.

Thanks Chip. So the encoding now looks like this? And the I bit in the instruction determines of the S field is interpreted as a register or as below.

PTRx expressions:

    INDEX = -16..+15 for simple offsets, 0..15 for ++'s, or 0..16 for --'s
    SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide

    S = 0 for PTRA, 1 for PTRB
    U = 0 to keep PTRx same, 1 to update PTRx
    P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
    NNNNN = INDEX
    nnnnn = -INDEX


    ISUPNNNN     PTR expression
    -----------------------------------------------------------------------------
    0NNNNNNNN     #NNNNNNNN

    100000000     PTRA              'use PTRA
    110000000     PTRB              'use PTRB
    101100001     PTRA++            'use PTRA,                PTRA += SCALE
    111100001     PTRB++            'use PTRB,                PTRB += SCALE
    101111111     PTRA--            'use PTRA,                PTRA -= SCALE
    111111111     PTRB--            'use PTRB,                PTRB -= SCALE
    101000001     ++PTRA            'use PTRA + SCALE,        PTRA += SCALE
    111000001     ++PTRB            'use PTRB + SCALE,        PTRB += SCALE
    101011111     --PTRA            'use PTRA - SCALE,        PTRA -= SCALE
    111011111     --PTRB            'use PTRB - SCALE,        PTRB -= SCALE

    1000NNNNN     PTRA[INDEX]       'use PTRA + INDEX*SCALE
    1100NNNNN     PTRB[INDEX]       'use PTRB + INDEX*SCALE
    1011NNNNN     PTRA++[INDEX]     'use PTRA,                PTRA += INDEX*SCALE
    1111NNNNN     PTRB++[INDEX]     'use PTRB,                PTRB += INDEX*SCALE
    1011nnnnn     PTRA--[INDEX]     'use PTRA,                PTRA -= INDEX*SCALE
    1111nnnnn     PTRB--[INDEX]     'use PTRB,                PTRB -= INDEX*SCALE
    1010NNNNN     ++PTRA[INDEX]     'use PTRA + INDEX*SCALE,  PTRA += INDEX*SCALE
    1110NNNNN     ++PTRB[INDEX]     'use PTRB + INDEX*SCALE,  PTRB += INDEX*SCALE
    1010nnnnn     --PTRA[INDEX]     'use PTRA - INDEX*SCALE,  PTRA -= INDEX*SCALE
    1110nnnnn     --PTRB[INDEX]     'use PTRB - INDEX*SCALE,  PTRB -= INDEX*SCALE

cgracey · 2015-09-26 22:20

Roy Eltham wrote: »

potatohead wrote: »

Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!

Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.

Things should work in straight forward easy to describe ways, not with weird gotcha kluges.

Here's the thing, though. The assembler will never let you create cog code at non-long-aligned addresses. In other words, you will never be jumping to cog locations that are not long-aligned. If you jump to low addresses that are not long aligned, it can only be hub code.

Cluso99 · 2015-09-26 22:39

Can someone test this please (not sure about the rep instruction)?

dat
        orgh    1

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

        rep     #1,#15-1        'repeat 1 instruction 15 times
        coginit #16,#blink               '<--- think this might need to be coginit #3,#blink ???

blink   cogid   x               'which cog am I?
        cogid   pin
        setb    dirb,pin        'make that pin an output
        add     x,#16           'add to my id
        shl     x,#18           'shift it up to make it big

        rep     #2,#0           'repeat 2 instructions for ever
        notb    outb,pin        'flip its output state
        waitx   x               'wait that many clocks

me      jmp     @me             'never gets here!

        org
x       res     1               'variable at cog address 32 (register 8, RAM)
pin     res     1

cgracey · 2015-09-26 22:55

Cluso99 wrote: »

Can someone test this please (not sure about the rep instruction)?

dat
        orgh    1

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

        rep     #1,#15-1        'repeat 1 instruction 15 times
        coginit #16,#blink

blink   cogid   x               'which cog am I?
        cogid   pin
        setb    dirb,pin        'make that pin an output
        add     x,#16           'add to my id
        shl     x,#18           'shift it up to make it big

        rep     #2,#0           'repeat 2 instructions for ever
        notb    outb,pin        'flip its output state
        waitx   x               'wait that many clocks

me      jmp     @me             'never gets here!

        org
x       res     1               'variable at cog address 32 (register 8, RAM)
pin     res     1

REP only works in cog/LUT-exec mode.

I made a change the other day that causes it to just fall through in hub-exec mode, rather than blow up.

Cluso99 · 2015-09-26 23:06

Chip,
It is really nice to finally see a new P2 release. Well done!

I totally agree with Roy that orgh 1 seems a kluge. I have a few ideas to solve this.
However, for now, lets proceed with some testing.

Will you be releasing a DE0-Nano and/or BeMicroCV (5CEFA2F23C8N) version?
Would you like to post the pinout mapping for the 123 A7? We could then re-do the mappings for these to help.

Seairth · 2015-09-26 23:24

Cluso99 wrote: »

blink   cogid   x               'which cog am I?
        cogid   pin

Also, I thought there was something special about calling COGID twice in a row. Or was that a P2-hot thing? Or am I just recalling incorrectly?

Cluso99 · 2015-09-26 23:26

cgracey wrote: »
Cluso99 wrote: »
Can someone test this please (not sure about the rep instruction)?
dat
        orgh    1

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

        rep     #1,#15-1        'repeat 1 instruction 15 times
        coginit #16,#blink     '<---I think this should be coginit #3,#blink
blink   cogid   x               'which cog am I?
        .....
REP only works in cog/LUT-exec mode.

I made a change the other day that causes it to just fall through in hub-exec mode, rather than blow up.

Thanks Chip. Now I recall some things don't work in hubexec.
BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???

Ariba · 2015-09-27 00:07

Chip

Do we also get an FPGA image for the DE0-Nano with the Adapter board?
It was a supported board for the P2-hot, and I and maybe others bought it specially for P2 developement and tests.
The pin-assignement files and other necessary files should already exist, and also this adapter board would not be wasted.

Not sure how many cogs it can fit, but I was quite happy with the one P2-hot cog. Okay there were 4 hardware tasks then.
Perhaps now 2 cogs of the new simplified P2 will fit, without the cordic.

I hope some others also want to play with the DE0-Nano, otherwise it makes not much sense to take the effort.
I understand that you can't support too many different boards.

-Andy

Edit: I see Cluso99 also asked for it, so we are already two..

Seairth · 2015-09-27 00:13

(moved to a separate thread.)

Cluso99 · 2015-09-27 00:20

Seairth wrote: »
Cluso99 wrote: »
blink   cogid   x               'which cog am I?
        cogid   pin
Also, I thought there was something special about calling COGID twice in a row. Or was that a P2-hot thing? Or am I just recalling incorrectly?

Nothing special. Could replace the second coginit with mov pin,x

re your following post about REP
I agree something nicer would be better. It will be the job of the compiler (not the FPGA image) so perhaps we can wait for this. We get by with Chip's pnut.exe and save complexities for Roy's open compiler later.

I would also like to see CALL -> CALLS since it uses the internal stack.
CALLD is fine for me although CALLR (cog register) would work fine too.

Seairth · 2015-09-27 00:28

Cluso99 wrote: »

BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???

I think the first parameter is:

#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)

Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.

Cluso99 · 2015-09-27 00:38

Seairth wrote: »

Cluso99 wrote: »

BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???

I think the first parameter is:

#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)

Of course. The #blink is the start address - need more coffee

Wonder where you find out which cog started when using next available cog?

Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.

I believe that is the plan

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments