What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.
It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.
I think I almost understand the blinky example, but some things look magic...
What does "orgh 1" do? Why not just orgh ? Does this code start at $1000 (I think so)?
The last two lines with org and res x are hurting my brain...
Does the compiler load anything after "org" into cog before starting?
Or does this only work for "res" reserved space that doesn't need initializing?
That ORGH 1 is there because that's where the loader jumps into your code. It's that non-aligned hub exec below $1000 that people hate. I just haven't changed it yet. I kind of don't want to, because it allows most efficient use of memory. You could always just put a JMP #$1000 after it and pretend it's not really happening.
That ORG + RES business was just a quick way to get some symbolic cog registers declared. It doesn't generate any code. Each blinking cog will use its own instance of those registers.
What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.
Thanks,
David
The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.
Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.
I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.
If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.
The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.
In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...
Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.
What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.
Thanks,
David
The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.
Okay, so ROM_Booter.spin is the only thing that is fixed. MainLoader.spin is the second-stage loader and could be different for different loader implementations, right?
It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.
What is the full register set and addresses?
CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction
# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches
It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.
What is the full register set and addresses?
CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction
# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches
Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?
What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.
Thanks,
David
The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.
Okay, so ROM_Booter.spin is the only thing that is fixed. MainLoader.spin is the second-stage loader and could be different for different loader implementations, right?
It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.
What is the full register set and addresses?
CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction
# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches
Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?
Yes. And any offset expressed is scaled by the word size.
It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.
What is the full register set and addresses?
CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction
# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches
Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?
Yes. And any offset expressed is scaled by the word size.
Ah, okay. So the new P2 instruction set still has all of the PTRx addressing modes of P2-hot? I didn't realize that.
PTRx expressions:
INDEX = -32..+31 for simple offsets, 0..31 for ++'s, or 0..32 for --'s
SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide
S = 0 for PTRA, 1 for PTRB
U = 0 to keep PTRx same, 1 to update PTRx
P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
NNNNNN = INDEX
nnnnnn = -INDEX
SUPNNNNNN PTR expression
-----------------------------------------------------------------------------
000000000 PTRA 'use PTRA
100000000 PTRB 'use PTRB
011000001 PTRA++ 'use PTRA, PTRA += SCALE
111000001 PTRB++ 'use PTRB, PTRB += SCALE
011111111 PTRA-- 'use PTRA, PTRA -= SCALE
111111111 PTRB-- 'use PTRB, PTRB -= SCALE
010000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE
110000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE
010111111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE
110111111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE
000NNNNNN PTRA[INDEX] 'use PTRA + INDEX*SCALE
100NNNNNN PTRB[INDEX] 'use PTRB + INDEX*SCALE
011NNNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE
111NNNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE
011nnnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE
111nnnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE
010NNNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE
110NNNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE
010nnnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE
110nnnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE
That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.
Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.
I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.
If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.
The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.
In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...
Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.
The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!
Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.
Things should work in straight forward easy to describe ways, not with weird gotcha kluges.
Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.
I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.
If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.
The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.
In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...
Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.
The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!
Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.
Things should work in straight forward easy to describe ways, not with weird gotcha kluges.
Ummm... Is there any reason that the ROM has to be loaded starting at $0000? Why can't it be loaded starting at $1000 in the first place?
PTRx expressions:
INDEX = -32..+31 for simple offsets, 0..31 for ++'s, or 0..32 for --'s
SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide
S = 0 for PTRA, 1 for PTRB
U = 0 to keep PTRx same, 1 to update PTRx
P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
NNNNNN = INDEX
nnnnnn = -INDEX
SUPNNNNNN PTR expression
-----------------------------------------------------------------------------
000000000 PTRA 'use PTRA
100000000 PTRB 'use PTRB
011000001 PTRA++ 'use PTRA, PTRA += SCALE
111000001 PTRB++ 'use PTRB, PTRB += SCALE
011111111 PTRA-- 'use PTRA, PTRA -= SCALE
111111111 PTRB-- 'use PTRB, PTRB -= SCALE
010000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE
110000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE
010111111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE
110111111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE
000NNNNNN PTRA[INDEX] 'use PTRA + INDEX*SCALE
100NNNNNN PTRB[INDEX] 'use PTRB + INDEX*SCALE
011NNNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE
111NNNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE
011nnnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE
111nnnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE
010NNNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE
110NNNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE
010nnnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE
110nnnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE
That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.
Thanks Chip. So the encoding now looks like this? And the I bit in the instruction determines of the S field is interpreted as a register or as below.
PTRx expressions:
INDEX = -16..+15 for simple offsets, 0..15 for ++'s, or 0..16 for --'s
SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide
S = 0 for PTRA, 1 for PTRB
U = 0 to keep PTRx same, 1 to update PTRx
P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
NNNNN = INDEX
nnnnn = -INDEX
ISUPNNNN PTR expression
-----------------------------------------------------------------------------
0NNNNNNNN #NNNNNNNN
100000000 PTRA 'use PTRA
110000000 PTRB 'use PTRB
101100001 PTRA++ 'use PTRA, PTRA += SCALE
111100001 PTRB++ 'use PTRB, PTRB += SCALE
101111111 PTRA-- 'use PTRA, PTRA -= SCALE
111111111 PTRB-- 'use PTRB, PTRB -= SCALE
101000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE
111000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE
101011111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE
111011111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE
1000NNNNN PTRA[INDEX] 'use PTRA + INDEX*SCALE
1100NNNNN PTRB[INDEX] 'use PTRB + INDEX*SCALE
1011NNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE
1111NNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE
1011nnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE
1111nnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE
1010NNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE
1110NNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE
1010nnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE
1110nnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE
Not everyone hates it... I think it's a perfectly great idea that makes good use of the addressing scheme.
I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.
If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.
The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.
In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...
Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.
The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!
Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.
Things should work in straight forward easy to describe ways, not with weird gotcha kluges.
Here's the thing, though. The assembler will never let you create cog code at non-long-aligned addresses. In other words, you will never be jumping to cog locations that are not long-aligned. If you jump to low addresses that are not long aligned, it can only be hub code.
Can someone test this please (not sure about the rep instruction)?
dat
orgh 1
' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink
rep #1,#15-1 'repeat 1 instruction 15 times
coginit #16,#blink '<--- think this might need to be coginit #3,#blink ???
blink cogid x 'which cog am I?
cogid pin
setb dirb,pin 'make that pin an output
add x,#16 'add to my id
shl x,#18 'shift it up to make it big
rep #2,#0 'repeat 2 instructions for ever
notb outb,pin 'flip its output state
waitx x 'wait that many clocks
me jmp @me 'never gets here!
org
x res 1 'variable at cog address 32 (register 8, RAM)
pin res 1
Can someone test this please (not sure about the rep instruction)?
dat
orgh 1
' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink
rep #1,#15-1 'repeat 1 instruction 15 times
coginit #16,#blink
blink cogid x 'which cog am I?
cogid pin
setb dirb,pin 'make that pin an output
add x,#16 'add to my id
shl x,#18 'shift it up to make it big
rep #2,#0 'repeat 2 instructions for ever
notb outb,pin 'flip its output state
waitx x 'wait that many clocks
me jmp @me 'never gets here!
org
x res 1 'variable at cog address 32 (register 8, RAM)
pin res 1
REP only works in cog/LUT-exec mode.
I made a change the other day that causes it to just fall through in hub-exec mode, rather than blow up.
Chip,
It is really nice to finally see a new P2 release. Well done!
I totally agree with Roy that orgh 1 seems a kluge. I have a few ideas to solve this.
However, for now, lets proceed with some testing.
Will you be releasing a DE0-Nano and/or BeMicroCV (5CEFA2F23C8N) version?
Would you like to post the pinout mapping for the 123 A7? We could then re-do the mappings for these to help.
Can someone test this please (not sure about the rep instruction)?
dat
orgh 1
' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink
rep #1,#15-1 'repeat 1 instruction 15 times
coginit #16,#blink '<---I think this should be coginit #3,#blink
blink cogid x 'which cog am I?
.....
REP only works in cog/LUT-exec mode.
I made a change the other day that causes it to just fall through in hub-exec mode, rather than blow up.
Thanks Chip. Now I recall some things don't work in hubexec.
BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???
Do we also get an FPGA image for the DE0-Nano with the Adapter board?
It was a supported board for the P2-hot, and I and maybe others bought it specially for P2 developement and tests.
The pin-assignement files and other necessary files should already exist, and also this adapter board would not be wasted.
Not sure how many cogs it can fit, but I was quite happy with the one P2-hot cog. Okay there were 4 hardware tasks then.
Perhaps now 2 cogs of the new simplified P2 will fit, without the cordic.
I hope some others also want to play with the DE0-Nano, otherwise it makes not much sense to take the effort.
I understand that you can't support too many different boards.
-Andy
Edit: I see Cluso99 also asked for it, so we are already two..
Also, I thought there was something special about calling COGID twice in a row. Or was that a P2-hot thing? Or am I just recalling incorrectly?
Nothing special. Could replace the second coginit with mov pin,x
re your following post about REP
I agree something nicer would be better. It will be the job of the compiler (not the FPGA image) so perhaps we can wait for this. We get by with Chip's pnut.exe and save complexities for Roy's open compiler later.
I would also like to see CALL -> CALLS since it uses the internal stack.
CALLD is fine for me although CALLR (cog register) would work fine too.
BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???
I think the first parameter is:
#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)
Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.
BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???
I think the first parameter is:
#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)
Of course. The #blink is the start address - need more coffee
Wonder where you find out which cog started when using next available cog?
Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.
Comments
What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.
Thanks,
David
Have fun. I'll be watching with great interest.
What is the full register set and addresses?
That ORGH 1 is there because that's where the loader jumps into your code. It's that non-aligned hub exec below $1000 that people hate. I just haven't changed it yet. I kind of don't want to, because it allows most efficient use of memory. You could always just put a JMP #$1000 after it and pretend it's not really happening.
That ORG + RES business was just a quick way to get some symbolic cog registers declared. It doesn't generate any code. Each blinking cog will use its own instance of those registers.
The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.
I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.
If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.
The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.
In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...
Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.
CALLA uses PTRA
CALLB uses PTRB
CALL uses the internal 8-level hardware stack
CALLD is the link instruction
# is absolute 20-bit address
@ is relative 20-bit address, 9-bit for D,S branches
Thanks! What are the semantics of PTRA and PTRB? Are they stack pointers that auto increment and decrement?
Correct.
Yes. And any offset expressed is scaled by the word size.
That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.
The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!
Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.
Things should work in straight forward easy to describe ways, not with weird gotcha kluges.
That works fine too. Not sure why Chip thinks it needs to be loaded at $0 and make this kluge to use that memory for hubexec.
Here's the thing, though. The assembler will never let you create cog code at non-long-aligned addresses. In other words, you will never be jumping to cog locations that are not long-aligned. If you jump to low addresses that are not long aligned, it can only be hub code.
REP only works in cog/LUT-exec mode.
I made a change the other day that causes it to just fall through in hub-exec mode, rather than blow up.
It is really nice to finally see a new P2 release. Well done!
I totally agree with Roy that orgh 1 seems a kluge. I have a few ideas to solve this.
However, for now, lets proceed with some testing.
Will you be releasing a DE0-Nano and/or BeMicroCV (5CEFA2F23C8N) version?
Would you like to post the pinout mapping for the 123 A7? We could then re-do the mappings for these to help.
Also, I thought there was something special about calling COGID twice in a row. Or was that a P2-hot thing? Or am I just recalling incorrectly?
BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???
Do we also get an FPGA image for the DE0-Nano with the Adapter board?
It was a supported board for the P2-hot, and I and maybe others bought it specially for P2 developement and tests.
The pin-assignement files and other necessary files should already exist, and also this adapter board would not be wasted.
Not sure how many cogs it can fit, but I was quite happy with the one P2-hot cog. Okay there were 4 hardware tasks then.
Perhaps now 2 cogs of the new simplified P2 will fit, without the cordic.
I hope some others also want to play with the DE0-Nano, otherwise it makes not much sense to take the effort.
I understand that you can't support too many different boards.
-Andy
Edit: I see Cluso99 also asked for it, so we are already two..
re your following post about REP
I agree something nicer would be better. It will be the job of the compiler (not the FPGA image) so perhaps we can wait for this. We get by with Chip's pnut.exe and save complexities for Roy's open compiler later.
I would also like to see CALL -> CALLS since it uses the internal stack.
CALLD is fine for me although CALLR (cog register) would work fine too.
I think the first parameter is:
#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)
Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.
Wonder where you find out which cog started when using next available cog? I believe that is the plan