Someone mentioned this long ago, but I don't recall there ever being an answer/response to it: what if the longs were 34-bit?
That's more a viable option on FPGA, as the bits are almost free there.
The P2 silicon is a OnSemi memory compiler block and I'm not sure they can generate 34b.
34b also messes with Byte and int16 overlays, so there are plenty of fish-hooks, for little benefit....
I would have no idea if On could do it.
The ALU/cordic path could remain 32bit, so that bit width for longs/words/bytes remains unchanged, with the two additional bits containing only, say, C & Z flags.
I can see the use for the WZ in the RDBYTE/WORD/LONG, but not really WC.
I can see the use for S/PTRx although I am unsure what use PTRx will be without auto-incrementing/decrementing (see lower).
But I cannot see the use for S being immediate #. Do we really require immediate for access to the first 512B of hub???
It would be nice to be able to use D as immediate # in RDxxxx, just like in WRxxxx. But it is not that important.
However, what would be really nice is these instructions...
CCCC xxxxx00 0Z@ DDDDDDDDD SSSSSSSSS RDBYTE D/@,S {WZ}
CCCC xxxxx01 0Z@ DDDDDDDDD SSSSSSSSS RDWORD D/@,S {WZ}
CCCC xxxxx10 0Z@ DDDDDDDDD SSSSSSSSS RDLONG D/@,S {WZ}
CCCC xxxxx00 1Z@ DDDDDDDDD SSSSSSSSS WRBYTE D/@,S {WZ}
CCCC xxxxx01 1Z@ DDDDDDDDD SSSSSSSSS WRWORD D/@,S {WZ}
CCCC xxxxx10 1Z@ DDDDDDDDD SSSSSSSSS WRLONG D/@,S {WZ}
where
S LONG %xxxxxxxx_xxxxhhhh_hhhhhhhh_hhhhhhhh ' x=0(future), h=20-bit byte hub address
(Note: xxLONG ignores bits[1:0], xxWORD ignores bit[0]. ie LONG & WORD boundaries enforced!)
D LONG %xxxxxxxx_xxxxxxxx_xxxxxccc_cccccccc ' x=0(future), c=11-bit long cog/lut contiguous address
(Note: x=0(future) are bits currently ignored by the P2 but could be used in a later P2 with expanded hub and/or cog/lut RAM)
This would mean that we could have the Destination COG/LUT address stored as an 11-bit result in a Cog Register and access it indirectly. There is an extra level of indirection when we use the @D and that may mean that we need an extra clock, but it's a small price to pay to be able to address COG/LUT contiguously. And it removes the requirement to have a RD/WRLUT instruction.
This keeps the usual RD/WR-BYTE/WORD/LONG simple and easy to explain/understand.
SETQ & SETQ2
Since there would no longer be any requirement to have separate cog and lut r/w hub instructions, SETQ2 would no longer be required.
SETQ should only apply to RD/WRLONG (or a new RD/WRBLOCK instruction). All block moves will be LONGs and must be on a LONG boundary.
So SETQ could become...
CCCC 1101011 LDD DDDDDDDDD 000010110 SETQ D/#
or
CCCC 1101011 DDL DDDDDDDDD 000010110 SETQ D/#
where...
D LONG %xxxxxxxx_xxxxxxxx_xxxxxccc_cccccccc 'x=0(future), c=11-bit long count for block move hub to/from cog/lut
(Note: when D is specified as an address in the SETQ instruction, D may only reside in COG RAM (ie D[10:9] will be ignored and should be "00")
Doesn't this simplify things and make it easier to understand as well ???
Would it be possible to load the rom into the end of the addressable hub space of one MB? More then 1MB is not possible without modifying Hub Exec's PC anyways. Just put 16K RAM (or how long the ROM is) at the end there? Else put it at the end of the 512K.
There Hub Exec is possible, it can still be overwritten, but the memory a $1000 is available for the user and gives a nice addressing scheme for COG/LUT/HUB/ROM in one continuous space.
I see no problem in having the first 4K of HUB used just for Data. We will need some space for Cog Pasm Images anyways, having 16 Cogs. Like on the P1 this space can be reused for Buffers or other Data if needed.
But the decryption, the monitor and some serial could be quite helpful while developing, but not needed in the end product.
Now you have to keep them while developing and your program has to start at say $1200 to keep the copied ROM but in production it is at $1000. Not good.
Production Code on a locked device fails and your Development Code shows complete different addresses? Not good.
I've almost got the Prop123 FPGA release done. I was hoping it would be today, but it looks like tomorrow. I set it at 50Mhz, since everybody can run that. It fit 12 cogs at 97% capacity. When we add smart pins, that will probably drop to 10. It compiles with 12 cogs at an Fmax of 80Mhz, which means we could certainly run it at 100Mhz. Need the PLLs for that, though.
After the Prop123 version, I'll get the DE2-115 version together.
I can see the use for the WZ in the RDBYTE/WORD/LONG, but not really WC.
I can see the use for S/PTRx although I am unsure what use PTRx will be without auto-incrementing/decrementing (see lower).
But I cannot see the use for S being immediate #. Do we really require immediate for access to the first 512B of hub???
It would be nice to be able to use D as immediate # in RDxxxx, just like in WRxxxx. But it is not that important.
However, what would be really nice is these instructions...
CCCC xxxxx00 0Z@ DDDDDDDDD SSSSSSSSS RDBYTE D/@,S {WZ}
CCCC xxxxx01 0Z@ DDDDDDDDD SSSSSSSSS RDWORD D/@,S {WZ}
CCCC xxxxx10 0Z@ DDDDDDDDD SSSSSSSSS RDLONG D/@,S {WZ}
CCCC xxxxx00 1Z@ DDDDDDDDD SSSSSSSSS WRBYTE D/@,S {WZ}
CCCC xxxxx01 1Z@ DDDDDDDDD SSSSSSSSS WRWORD D/@,S {WZ}
CCCC xxxxx10 1Z@ DDDDDDDDD SSSSSSSSS WRLONG D/@,S {WZ}
where
S LONG %xxxxxxxx_xxxxhhhh_hhhhhhhh_hhhhhhhh ' x=0(future), h=20-bit byte hub address
(Note: xxLONG ignores bits[1:0], xxWORD ignores bit[0]. ie LONG & WORD boundaries enforced!)
D LONG %xxxxxxxx_xxxxxxxx_xxxxxccc_cccccccc ' x=0(future), c=11-bit long cog/lut contiguous address
(Note: x=0(future) are bits currently ignored by the P2 but could be used in a later P2 with expanded hub and/or cog/lut RAM)
This would mean that we could have the Destination COG/LUT address stored as an 11-bit result in a Cog Register and access it indirectly. There is an extra level of indirection when we use the @D and that may mean that we need an extra clock, but it's a small price to pay to be able to address COG/LUT contiguously. And it removes the requirement to have a RD/WRLUT instruction.
This keeps the usual RD/WR-BYTE/WORD/LONG simple and easy to explain/understand.
SETQ & SETQ2
Since there would no longer be any requirement to have separate cog and lut r/w hub instructions, SETQ2 would no longer be required.
SETQ should only apply to RD/WRLONG (or a new RD/WRBLOCK instruction). All block moves will be LONGs and must be on a LONG boundary.
So SETQ could become...
CCCC 1101011 LDD DDDDDDDDD 000010110 SETQ D/#
or
CCCC 1101011 DDL DDDDDDDDD 000010110 SETQ D/#
where...
D LONG %xxxxxxxx_xxxxxxxx_xxxxxccc_cccccccc 'x=0(future), c=11-bit long count for block move hub to/from cog/lut
(Note: when D is specified as an address in the SETQ instruction, D may only reside in COG RAM (ie D[10:9] will be ignored and should be "00")
Doesn't this simplify things and make it easier to understand as well ???
this would imply that whole COG+LUT space can be used for registers
with the only caveat, that immediate addressing is only available in the 9-bit COG address range
right??
this would imply that whole COG+LUT space can be used for registers
with the only caveat, that immediate addressing is only available in the 9-bit COG address range
right??
No, definitely not at this time.
What it means is that we could load from hub to cog and/or lut, or save cog and/or lut to hub.
Apart from the fact that normal programs only have 9 bits available to address the registers ($000..1FF), COG RAM is dual ported permitting both D and S values to be read simultaneously, and permitting both I to be read and R (result of a previous instruction) to be written simultaneously. LUT is only single port RAM.
I've almost got the Prop123 FPGA release done. I was hoping it would be today, but it looks like tomorrow. I set it at 50Mhz, since everybody can run that. It fit 12 cogs at 97% capacity. When we add smart pins, that will probably drop to 10. It compiles with 12 cogs at an Fmax of 80Mhz, which means we could certainly run it at 100Mhz. Need the PLLs for that, though.
After the Prop123 version, I'll get the DE2-115 version together.
*WHEW!* I was actually worried that you would release it today. All my "real" work would have come to a screeching halt! Fortunately, it's raining all weekend, so a perfect excuse to play with an FPGA (starting tomorrow, of course)!
I've almost got the Prop123 FPGA release done. I was hoping it would be today, but it looks like tomorrow. I set it at 50Mhz, since everybody can run that. It fit 12 cogs at 97% capacity. When we add smart pins, that will probably drop to 10. It compiles with 12 cogs at an Fmax of 80Mhz, which means we could certainly run it at 100Mhz. Need the PLLs for that, though.
After the Prop123 version, I'll get the DE2-115 version together.
Any chance of a DE2-115 image today or will that have to wait until next week?
Any chance of a DE2-115 image today or will that have to wait until next week?
I've been working on it all day. I needed to get PNut.exe to work with multiple boards, so I thought I would do the DE2-115 and the Prop123 at the same time, to start.
I noticed a few little problems as I made all the compiles and cross-checks. So, I am fixing those right now. Maybe later tonight I will have something. Sorry this is taking so long.
Ah, not happy ... CALL D, CALLD, CALLD D! ... Just looking at what Cluso has compiled.
One problem with the relabelling of LINK to CALLD is there is no paired RETD because it's not quite a full stacked mechanism.
But obviously there is also the comprehension issue of D being used both as part of an opcode name and also a register direct place holder. This is particularly bad when both are in the one instruction together!
Ah, not happy ... CALL D, CALLD, CALLD D! ... Just looking at what Cluso has compiled.
One problem with the relabelling of LINK to CALLD is there is no paired RETD because it's not quite a full stacked mechanism.
But obviously there is also the comprehension issue of D being used both as part of an opcode name and also a register direct place holder. This is particularly bad when both are in the one instruction together!
I kind of liked "LINK". I was thinking CALLR for 'register' might make more sense than CALLD, when there's only one operand, anyway.
Where is I/O now on DE2_115 with Prop board relative to where it's going to be finally?
I guess right now, I'm mainly interested in digital I/O. Is this working the same way now that it will in final chip?
I think I've heard that smart-pins are the next thing Chip will work on... I don't seem to remember what "smart pins" are...
Do they include the analog I/O modes and resistor pull-up and down modes? Or, are they just special digital modes?
I think it'd be fun to connect an LCD to DE2-115 or Prop123 and would like to figure out how to do that mechanically and code wise...
Where is I/O now on DE2_115 with Prop board relative to where it's going to be finally?
I guess right now, I'm mainly interested in digital I/O. Is this working the same way now that it will in final chip?
I think I've heard that smart-pins are the next thing Chip will work on... I don't seem to remember what "smart pins" are...
Do they include the analog I/O modes and resistor pull-up and down modes? Or, are they just special digital modes?
I think it'd be fun to connect an LCD to DE2-115 or Prop123 and would like to figure out how to do that mechanically and code wise...
The streamer can write data directly to the i/o pins, not just to the DACs, up to 32 bits per clock, from hub or LUT.
The streamer can write data directly to the i/o pins, not just to the DACs, up to 32 bits per clock, from hub or LUT.
It captures bytes, words, or longs. I like the idea of one, two, or four bits, as well, getting written as bytes! The rate is already programmable by SETXFRQ: $8000000 = every clock, $40000000 = every 2nd clock, $2AAAAAAB = every 3rd clock. In that case of every third clock, the LSB must be set to ensure that it rolls over (reaches $80000000+) on the initial third clock. Bit 31 is not kept by the phase accumulator.
Comments
I would have no idea if On could do it.
The ALU/cordic path could remain 32bit, so that bit width for longs/words/bytes remains unchanged, with the two additional bits containing only, say, C & Z flags.
BTW I got it into my head that the RD/WRxxxx worked differently to how it really does
However, this got me thinking about what I had been thinking!
Here are the related instructions. I am unsure what they all are for, but here they are...
RDBYTE/RDWORD/RDLONG & WRBYTE/WRWORD/WRLONG
Let's look at some of those that we all know pretty well, and are similar to P1...
I can see the use for the WZ in the RDBYTE/WORD/LONG, but not really WC.
I can see the use for S/PTRx although I am unsure what use PTRx will be without auto-incrementing/decrementing (see lower).
But I cannot see the use for S being immediate #. Do we really require immediate for access to the first 512B of hub???
It would be nice to be able to use D as immediate # in RDxxxx, just like in WRxxxx. But it is not that important.
However, what would be really nice is these instructions... This would mean that we could have the Destination COG/LUT address stored as an 11-bit result in a Cog Register and access it indirectly. There is an extra level of indirection when we use the @D and that may mean that we need an extra clock, but it's a small price to pay to be able to address COG/LUT contiguously. And it removes the requirement to have a RD/WRLUT instruction.
This keeps the usual RD/WR-BYTE/WORD/LONG simple and easy to explain/understand.
SETQ & SETQ2
Since there would no longer be any requirement to have separate cog and lut r/w hub instructions, SETQ2 would no longer be required.
SETQ should only apply to RD/WRLONG (or a new RD/WRBLOCK instruction). All block moves will be LONGs and must be on a LONG boundary.
So SETQ could become...
Doesn't this simplify things and make it easier to understand as well ???
There Hub Exec is possible, it can still be overwritten, but the memory a $1000 is available for the user and gives a nice addressing scheme for COG/LUT/HUB/ROM in one continuous space.
I see no problem in having the first 4K of HUB used just for Data. We will need some space for Cog Pasm Images anyways, having 16 Cogs. Like on the P1 this space can be reused for Buffers or other Data if needed.
But the decryption, the monitor and some serial could be quite helpful while developing, but not needed in the end product.
Now you have to keep them while developing and your program has to start at say $1200 to keep the copied ROM but in production it is at $1000. Not good.
Production Code on a locked device fails and your Development Code shows complete different addresses? Not good.
Leaving the ROM copy always in? Not good either.
Thoughts?
Mike
Production code is written at $1000 always.
While developing, leave the ROM tools there. When done, clear them.
Optional write protect bit to prevent overwrites when developing and troubleshooting.
The chip allows non aligned hub code anyway.
After the Prop123 version, I'll get the DE2-115 version together.
Even 10 cogs would be fine, too, if it gives a bit more room for optimization/fitting
Refresh....refresh.....refresh.....
this would imply that whole COG+LUT space can be used for registers
with the only caveat, that immediate addressing is only available in the 9-bit COG address range
right??
What it means is that we could load from hub to cog and/or lut, or save cog and/or lut to hub.
Apart from the fact that normal programs only have 9 bits available to address the registers ($000..1FF), COG RAM is dual ported permitting both D and S values to be read simultaneously, and permitting both I to be read and R (result of a previous instruction) to be written simultaneously. LUT is only single port RAM.
*WHEW!* I was actually worried that you would release it today. All my "real" work would have come to a screeching halt! Fortunately, it's raining all weekend, so a perfect excuse to play with an FPGA (starting tomorrow, of course)!
I was going to get the P123 A7 version, but then the issues came up and I figured I should wait for the A9 one.
I've been working on it all day. I needed to get PNut.exe to work with multiple boards, so I thought I would do the DE2-115 and the Prop123 at the same time, to start.
I noticed a few little problems as I made all the compiles and cross-checks. So, I am fixing those right now. Maybe later tonight I will have something. Sorry this is taking so long.
One problem with the relabelling of LINK to CALLD is there is no paired RETD because it's not quite a full stacked mechanism.
But obviously there is also the comprehension issue of D being used both as part of an opcode name and also a register direct place holder. This is particularly bad when both are in the one instruction together!
I kind of liked "LINK". I was thinking CALLR for 'register' might make more sense than CALLD, when there's only one operand, anyway.
There's a serious lack of documentation, yet, but those of you that have been playing with FPGA's will find your way.
I'll make a new thread and hope I can post the ~5MB file there.
I guess right now, I'm mainly interested in digital I/O. Is this working the same way now that it will in final chip?
I think I've heard that smart-pins are the next thing Chip will work on... I don't seem to remember what "smart pins" are...
Do they include the analog I/O modes and resistor pull-up and down modes? Or, are they just special digital modes?
I think it'd be fun to connect an LCD to DE2-115 or Prop123 and would like to figure out how to do that mechanically and code wise...
The streamer can write data directly to the i/o pins, not just to the DACs, up to 32 bits per clock, from hub or LUT.
IIRC Peter did a splendid job and put them up on google???
The P2 FPGA Image is finally here!
P2 Day is September 26, 2015. It will be celebrated for years to come.
Thanks Chip.
I'd be happy with CALLR. CALLK also.
JMPLR or JMPR or JMPLK or JMPK could work as this makes clear, just from the name, it's a branching operation - which LINK doesn't.
LINK is already well known naming so is also still fine, imho.
Is that up to 32 bits, one bit per clock?
Can you also read up to 32 bits, one per clock from a pin?
Is that instruction clock, or system clock?
Can the clock be exposed on an adjacent pin?
If the clock can be exposed, that gives us SPI master (half duplex) with the above for free.
For full duplex SPI, two pins would need to be sync'd with a third as a clock. An arbitrary other pin could be the chip select.
It captures bytes, words, or longs. I like the idea of one, two, or four bits, as well, getting written as bytes! The rate is already programmable by SETXFRQ: $8000000 = every clock, $40000000 = every 2nd clock, $2AAAAAAB = every 3rd clock. In that case of every third clock, the LSB must be set to ensure that it rolls over (reaches $80000000+) on the initial third clock. Bit 31 is not kept by the phase accumulator.