That's why I'm hoping an updated instruction reference gets published first. Then we can use it to update Gear and play with the P2 in software. Useful both now *and* when the chip is released.
As Leon pointed out, the FPGA on the DE2-115 costs about $400, alone. If it was much cheaper, we could make a one-size-fits-all board. For now, I think we will support both the DE0 and DE2 boards. The DE0 is very cheap to make add-on boards for, since it uses .100" connectors, while the DE2 uses a special Samsung HSMC part that is expensive. The DE0 board would be great for writing single-cog apps in assembly, but not much else. The DE2 board would be better for developing tool systems on which need something closer to the whole chip.
So Chip, have you any thoughts about some kind of ICE in the binary blob for the DE2 boards? Even the simple single-step and examine registers and status will do.
So Chip, have you any thoughts about some kind of ICE in the binary blob for the DE2 boards? Even the simple single-step and examine registers and status will do.
Peter, out the starting gate we won't have much support in the way of formal debuggers. You'll have serial debugging and the monitor program at first. Chip's efforts are focused on instruction set documentation, bootloader instructions, and then a Spin interpreter. I don't know whether or not a traditional debugger is possible with our architecture or not, at least at this time. You'll have GDB support in PropGCC, however, but that will also be a bit later (March or so).
Peter, out the starting gate we won't have much support in the way of formal debuggers. You'll have serial debugging and the monitor program at first. Chip's efforts are focused on instruction set documentation, bootloader instructions, and then a Spin interpreter. I don't know whether or not a traditional debugger is possible with our architecture or not, at least at this time. You'll have GDB support in PropGCC, however, but that will also be a bit later (March or so).
Hi Ken, I know if I had the VHDL source I could probably add this functionality and that might or might not mess up the actual functioning of P2 but it's an option. I'm only asking if it's simple as we will have the binary blob and not the source (understandably). Hopefully the serial debugger will be good enough but we are used to manhandling bits and bytes so I am sure we will manage.
So Chip, have you any thoughts about some kind of ICE in the binary blob for the DE2 boards? Even the simple single-step and examine registers and status will do.
I will investigate this. I could use the dual ports on the FPGA memories to continuously copy a cog's RAM into hub RAM. Maybe these updates could be coordinated with single-steps.
Hi Chip, just out of curiosity, will the new SPIN interpreter work from just hub-ram like the Prop1 or will that be able to access external ram in any way? or would that be getting too intricate as it'd have to have a separate cog to drive the SDRAM? or will you be able to map in functions? ie if you had another cog read some stuff in from the SDRAM?
It would be great if the Spin byte codes could be documented as well as the PASM instructions. This would allow other compilers to target the same byte codes, and it would make it easier to build alternate Spin VM's (ie for external memory, as per Bagger's request above). I am aware of the partial documentation in the code, and the work of David and others... but a supported set of official byte code docs would be great :-)
Mind you, documenting the P2 instructions well is FAR more important at the moment!
Could you also think if its possible to bring out the four normally unconnected pins of physical port 2 to the mezzanine connector of the DE2 so one can use them to output things like running program status or even mild serial debug information, using one free Cog or even a "timely stolen" thread to do that?
I was wondering if even single stepping could benefit from that setup.
If its possible to do it this way, one could write debug code to run in parallel with many other concurrent tasks, without wasting precious physical resources pertaining only to the final application.
Hi Chip, just out of curiosity, will the new SPIN interpreter work from just hub-ram like the Prop1 or will that be able to access external ram in any way? or would that be getting too intricate as it'd have to have a separate cog to drive the SDRAM? or will you be able to map in functions? ie if you had another cog read some stuff in from the SDRAM?
It could be made to do both. If it uses external SDRAM, there must be a cog dedicated to serving memory to all other tasks, as video will need big memory, too. I haven't gotten there yet, myself. Anything is possible, though.
DE2 has one 40-pin, 36 bit standard GPIO port, too
36 pins is exactly what is needed for a cog's worth of 9-bit DACs.
On the DE2 board we found that they wired some pins to the HSMC connector that were NOT under jumper control for 3.3V/2.5V/1.8V power selection. We will redesign our general-purpose board to avoid using those pins for DACs.
It would be great if the Spin byte codes could be documented as well as the PASM instructions. This would allow other compilers to target the same byte codes, and it would make it easier to build alternate Spin VM's (ie for external memory, as per Bagger's request above). I am aware of the partial documentation in the code, and the work of David and others... but a supported set of official byte code docs would be great :-)
Mind you, documenting the P2 instructions well is FAR more important at the moment!
I'm working on the assembly instructions now. The new SPIN opcodes will not be ready until the compiler is working. They may be subject to some changes even after that as things evolve.
Could you also think if its possible to bring out the four normally unconnected pins of physical port 2 to the mezzanine connector of the DE2 so one can use them to output things like running program status or even mild serial debug information, using one free Cog or even a "timely stolen" thread to do that?
I was wondering if even single stepping could benefit from that setup.
If its possible to do it this way, one could write debug code to run in parallel with many other concurrent tasks, without wasting precious physical resources pertaining only to the final application.
Yanomani
I've been thinking that for the FPGA implementations, we could have an instruction that dumps the entire cog state into a stretch of hub ram. That takes nothing to coordinate - it just happens when you execute the special instruction. Single-stepping would require some kind of interaction involving at least in input pin to say "step". It would be neat to not require another cog.
It could be made to do both. If it uses external SDRAM, there must be a cog dedicated to serving memory to all other tasks, as video will need big memory, too. I haven't gotten there yet, myself. Anything is possible, though.
Ok, excellent, Just as I thought, and yes, anything is possible with a prop as we've seen with the Prop1
Chip, I don't know if you saw my BCMM post but you might want to take a quick look at it. It's a LMM-like trick which creates a code system very similar to a subset of Spin byte code that runs very fast and takes a small fraction of the COG footprint, leaving much more room open for other functions:
I've been thinking that for the FPGA implementations, we could have an instruction that dumps the entire cog state into a stretch of hub ram. That takes nothing to coordinate - it just happens when you execute the special instruction. Single-stepping would require some kind of interaction involving at least in input pin to say "step". It would be neat to not require another cog.
When you stated "the entire cog state" did you mean the totality of the cog accessible memory (excluding HUB, of course), like its CLUT and every special register (PTRA, PTRB, SETINDA, SETINDB, current LFSR and System Counter, 64 bit accumulators, cache, SETPORx, SETMAP, etc...), i.e. everything that can affect cog operation?
If so and if you can also provide a "cog independent" ability to dump this stretch of hub ram to the outside world by any means you could create, a tremendously powerfull development system could be brought to life, allowing almost any situation to be tested.
Dreaming a little farther, if the entire process could be reversed from the outside in, you surely can imagine where it arrives.
Fully configurable start/restart points in a way scarcely attainable by almost any other means (except an impractically long wait for a specific situation to be reached). Freezing and lately restoring break points. One can simply "inject" a cog (or cogs, for sure) state and observe/interfere in its behavior.
Simulations could be shared, in different stages, between many developers to boost team work.
Something equivalent to a Warp Drive till Alpha Centauri. Maybe the Holy Grail Wormhole to Software Peripheral Paradise
Please, if you do anything like this, mark it as "NON SUPPORTED USE. USE AT YOUR OWN DISCRETION AND RISK" to not being flooded by forum/mail/phone enquires about any green bugs (or whatever color they can grew up) appearing inside a badly configured state loaded inside the now "offended" Prop 2 simulator.
Also the code protection must not coexist with this type of resource to avoid misuse/tackling attempts.
As I stated above, its only a dream. Might be impractical and/or undesirable to unleash such kind of uncontrollable "beast".
Hi Chip, I'm ready to fire up my DE2 board in the morning, any chance of getting something that will run?
Ps. Chip -- If it is possible can You made DAC's only 8-Bit's on DE0
That give only use of 4x8 = 32 I/O's on JP2
Simpler to manage both ports and building I/O modules to DE0
I see that in the Propeller II the INx and OUTx ports are now combined into one PINx port.
Is there going to any problems with back-to-back instructions that modify the pins ? Like the SX read-modify-write issue ?
For example on the SX if you had two back-to-back XOR instructions you would get problems because the pin changed in the first XOR instruction had not changed state yet when the second XOR read the pins.
When you stated "the entire cog state" did you mean the totality of the cog accessible memory (excluding HUB, of course), like its CLUT and every special register (PTRA, PTRB, SETINDA, SETINDB, current LFSR and System Counter, 64 bit accumulators, cache, SETPORx, SETMAP, etc...), i.e. everything that can affect cog operation?
There will be limits, and invisible-write will be more fabric-costly than read. (but a write code stub is probably OK, debuggers have dome tricks for years )
What should be quite low-impact in a FPGA is a write-clone port, where you tap-off the Address, and write-Data and create a remote copy of memory in a Dual port block. Reading that copy, is entirely up to the user. The program counter would also be read.
Above that, would be more routing and speed impact, and you move away from tested space.
Invisible R/W of everything is a nice holy grail, but some middle solution will be more practical.
Much thanks for the input.
External access, without having to use a cog to retrieve as much data as possible, will ease debugging a lot.
A piece of cake will satisfy my appetite! But... please don't blame me by dreaming with the entire dessert.
Prop 2 makes me starving for new ways of doing new things. Better!
Let's go to it!
Cogs! At your marks!
Let's cog it, let's code it!
All the rest (of times) will be
The (not so) old way, we'll do it
Sometimes hard, however, it can be
Soft, soft, soft
Set a bit, reset a bit
Soft, soft, soft
Set a bit, reset a bit
Well I've got this DE2-115 board sitting on my bench and it looks awesome, should be a good debug tool. Now, if only I could get hold of a P2 config file for it......
I see that in the Propeller II the INx and OUTx ports are now combined into one PINx port.
Is there going to any problems with back-to-back instructions that modify the pins ? Like the SX read-modify-write issue ?
For example on the SX if you had two back-to-back XOR instructions you would get problems because the pin changed in the first XOR instruction had not changed state yet when the second XOR read the pins.
Bean
Here is the rule:
If PINx is accessed via S (source) or PINx is accessed via D (destination) and the instruction doesn't write D, then the pin input states are read; else, the pin output states are read.
MOV PINA,#$155 'write $00000155 to PINA outputs (writes)
ADD PINA,#1 'increment PINA outputs (writes)
TEST PINA,#$100 'test PINA inputs with $00000100 (doesn't write)
TEST mask,PINA 'test PINA inputs against mask (doesn't write)
Back-to-back XORs would be no problem, but other cases could be problematic. If you were incrementing a PINx output register and then did a compare, you would be reading inputs, not the data-forwarded-within-the-pipeline state of the outputs. This would be a problem. Just don't do that.
Well I've got this DE2-115 board sitting on my bench and it looks awesome, should be a good debug tool. Now, if only I could get hold of a P2 config file for it......
Peter,
Good man! I've got a .pof file for it, but it's all tied to my HSMC adapter board. You'll need one of those and I'm designing a new one to overcome some under-documented issues with the DE2-115 board, itself. I just designed the DE0-Nano board today, which will be laid out this week. Tomorrow, I'll work on the DE2-115 board and hopefully get it done, also. I will see if there's a way we could just plug a Propeller Plug into the .100" connector on the DE2-115 board so that you could, at least, get the monitor running. Give me a little time, like a few days here.
My DE0 have be in work more that one month -- So maybe I'm need that first
Ps. Chip -- If it is possible can You made DAC's only 8-Bit's on DE0
That give only use of 4x8 = 32 I/O's on JP2
Simpler to manage both ports and building I/O modules to DE0
I've got the full 9-bit DACs for the DE0-Nano board designed. I will post something in a half-hour, or so. I made one big board that the DE0-Nano plugs upside down into. The adapter board is much bigger than the FPGA board, so it becomes the anchor.
Good man! I've got a .pof file for it, but it's all tied to my HSMC adapter board. You'll need one of those and I'm designing a new one to overcome some under-documented issues with the DE2-115 board, itself. I just designed the DE0-Nano board today, which will be laid out this week. Tomorrow, I'll work on the DE2-115 board and hopefully get it done, also. I will see if there's a way we could just plug a Propeller Plug into the .100" connector on the DE2-115 board so that you could, at least, get the monitor running. Give me a little time, like a few days here.
Sure, I can wait. I'll just count all those electrons going by in the meantime and have a bit of a play with the DE2 and it's tools.
Here is the DE0-Nano board. This will run only 1 cog, but supports 4 DACs for VGA/component/composite video or audio. It also has 29 general-purpose I/Os (the last three are input-only and have push-buttons), the SPI Flash, and a reset button. This will get laid out this week and I hope we'll have boards next week. Tomorrow I hope to get the DE2-115 HSMC board done for 6-cog emulation.
Chip, I don't know if you saw my BCMM post but you might want to take a quick look at it. It's a LMM-like trick which creates a code system very similar to a subset of Spin byte code that runs very fast and takes a small fraction of the COG footprint, leaving much more room open for other functions:
That is a neat idea. I could make an instruction that remaps bits from a byte or word into a 32-bit instruction, but it might be a little late for that now. If there were only one or two really useful remaps, they could be implemented pretty easily at RTL-design-time.
WOW! The FPGA boards could produce (with some effort later) a really impressive debugger - way better than what is available for other chips. Could be a really impressive selling point for commercial developers!
Chip: I am just finishing up some work here - expect another 2 weeks. I would love to help with the Interpreter. As you know I did a faster version where the byte-codes were used as an offset into a hub 32bit table of 3 9bit vectors and 5 bits left for other things. Decoding each bytecode fast is a major point because it aids every bytecode. The other part I did was to unthread the maths routines which provided major speed. Together with some LMM mix, we could get some real speed out of the P2 interpreter. Coupled to that is the fifo used as two stacks for the variables, etc. I am sure you have considered most of this anyway, but just mentioning it in case.
Comments
I like to Gear --- But as it don't have real world connection possibility's --- It is not possible to test all in it.
Fair enough, but the cost point is much more palatable. I can wait for real world interaction until the P2 is released.
So Chip, have you any thoughts about some kind of ICE in the binary blob for the DE2 boards? Even the simple single-step and examine registers and status will do.
Peter, out the starting gate we won't have much support in the way of formal debuggers. You'll have serial debugging and the monitor program at first. Chip's efforts are focused on instruction set documentation, bootloader instructions, and then a Spin interpreter. I don't know whether or not a traditional debugger is possible with our architecture or not, at least at this time. You'll have GDB support in PropGCC, however, but that will also be a bit later (March or so).
I will investigate this. I could use the dual ports on the FPGA memories to continuously copy a cog's RAM into hub RAM. Maybe these updates could be coordinated with single-steps.
DE2 has one 40-pin, 36 bit standard GPIO port, too
It would be great if the Spin byte codes could be documented as well as the PASM instructions. This would allow other compilers to target the same byte codes, and it would make it easier to build alternate Spin VM's (ie for external memory, as per Bagger's request above). I am aware of the partial documentation in the code, and the work of David and others... but a supported set of official byte code docs would be great :-)
Mind you, documenting the P2 instructions well is FAR more important at the moment!
Could you also think if its possible to bring out the four normally unconnected pins of physical port 2 to the mezzanine connector of the DE2 so one can use them to output things like running program status or even mild serial debug information, using one free Cog or even a "timely stolen" thread to do that?
I was wondering if even single stepping could benefit from that setup.
If its possible to do it this way, one could write debug code to run in parallel with many other concurrent tasks, without wasting precious physical resources pertaining only to the final application.
Yanomani
It could be made to do both. If it uses external SDRAM, there must be a cog dedicated to serving memory to all other tasks, as video will need big memory, too. I haven't gotten there yet, myself. Anything is possible, though.
36 pins is exactly what is needed for a cog's worth of 9-bit DACs.
On the DE2 board we found that they wired some pins to the HSMC connector that were NOT under jumper control for 3.3V/2.5V/1.8V power selection. We will redesign our general-purpose board to avoid using those pins for DACs.
I'm working on the assembly instructions now. The new SPIN opcodes will not be ready until the compiler is working. They may be subject to some changes even after that as things evolve.
I've been thinking that for the FPGA implementations, we could have an instruction that dumps the entire cog state into a stretch of hub ram. That takes nothing to coordinate - it just happens when you execute the special instruction. Single-stepping would require some kind of interaction involving at least in input pin to say "step". It would be neat to not require another cog.
Ok, excellent, Just as I thought, and yes, anything is possible with a prop as we've seen with the Prop1
http://forums.parallax.com/showthread.php?143147-Byte-Code-Memory-Model-BCMM&p=1135048#post1135048
When you stated "the entire cog state" did you mean the totality of the cog accessible memory (excluding HUB, of course), like its CLUT and every special register (PTRA, PTRB, SETINDA, SETINDB, current LFSR and System Counter, 64 bit accumulators, cache, SETPORx, SETMAP, etc...), i.e. everything that can affect cog operation?
If so and if you can also provide a "cog independent" ability to dump this stretch of hub ram to the outside world by any means you could create, a tremendously powerfull development system could be brought to life, allowing almost any situation to be tested.
Dreaming a little farther, if the entire process could be reversed from the outside in, you surely can imagine where it arrives.
Fully configurable start/restart points in a way scarcely attainable by almost any other means (except an impractically long wait for a specific situation to be reached). Freezing and lately restoring break points. One can simply "inject" a cog (or cogs, for sure) state and observe/interfere in its behavior.
Simulations could be shared, in different stages, between many developers to boost team work.
Something equivalent to a Warp Drive till Alpha Centauri. Maybe the Holy Grail Wormhole to Software Peripheral Paradise
Please, if you do anything like this, mark it as "NON SUPPORTED USE. USE AT YOUR OWN DISCRETION AND RISK" to not being flooded by forum/mail/phone enquires about any green bugs (or whatever color they can grew up) appearing inside a badly configured state loaded inside the now "offended" Prop 2 simulator.
Also the code protection must not coexist with this type of resource to avoid misuse/tackling attempts.
As I stated above, its only a dream. Might be impractical and/or undesirable to unleash such kind of uncontrollable "beast".
Yanomani
My DE0 have be in work more that one month -- So maybe I'm need that first
Ps. Chip -- If it is possible can You made DAC's only 8-Bit's on DE0
That give only use of 4x8 = 32 I/O's on JP2
Simpler to manage both ports and building I/O modules to DE0
Is there going to any problems with back-to-back instructions that modify the pins ? Like the SX read-modify-write issue ?
For example on the SX if you had two back-to-back XOR instructions you would get problems because the pin changed in the first XOR instruction had not changed state yet when the second XOR read the pins.
Bean
There will be limits, and invisible-write will be more fabric-costly than read. (but a write code stub is probably OK, debuggers have dome tricks for years )
What should be quite low-impact in a FPGA is a write-clone port, where you tap-off the Address, and write-Data and create a remote copy of memory in a Dual port block. Reading that copy, is entirely up to the user. The program counter would also be read.
Above that, would be more routing and speed impact, and you move away from tested space.
Invisible R/W of everything is a nice holy grail, but some middle solution will be more practical.
Much thanks for the input.
External access, without having to use a cog to retrieve as much data as possible, will ease debugging a lot.
A piece of cake will satisfy my appetite! But... please don't blame me by dreaming with the entire dessert.
Prop 2 makes me starving for new ways of doing new things. Better!
Let's go to it!
Cogs! At your marks!
Let's cog it, let's code it!
All the rest (of times) will be
The (not so) old way, we'll do it
Sometimes hard, however, it can be
Soft, soft, soft
Set a bit, reset a bit
Soft, soft, soft
Set a bit, reset a bit
Let's cog it, let's code it!
Yanomani
Here is the rule:
If PINx is accessed via S (source) or PINx is accessed via D (destination) and the instruction doesn't write D, then the pin input states are read; else, the pin output states are read.
MOV PINA,#$155 'write $00000155 to PINA outputs (writes)
ADD PINA,#1 'increment PINA outputs (writes)
TEST PINA,#$100 'test PINA inputs with $00000100 (doesn't write)
TEST mask,PINA 'test PINA inputs against mask (doesn't write)
Back-to-back XORs would be no problem, but other cases could be problematic. If you were incrementing a PINx output register and then did a compare, you would be reading inputs, not the data-forwarded-within-the-pipeline state of the outputs. This would be a problem. Just don't do that.
Peter,
Good man! I've got a .pof file for it, but it's all tied to my HSMC adapter board. You'll need one of those and I'm designing a new one to overcome some under-documented issues with the DE2-115 board, itself. I just designed the DE0-Nano board today, which will be laid out this week. Tomorrow, I'll work on the DE2-115 board and hopefully get it done, also. I will see if there's a way we could just plug a Propeller Plug into the .100" connector on the DE2-115 board so that you could, at least, get the monitor running. Give me a little time, like a few days here.
I've got the full 9-bit DACs for the DE0-Nano board designed. I will post something in a half-hour, or so. I made one big board that the DE0-Nano plugs upside down into. The adapter board is much bigger than the FPGA board, so it becomes the anchor.
That is a neat idea. I could make an instruction that remaps bits from a byte or word into a 32-bit instruction, but it might be a little late for that now. If there were only one or two really useful remaps, they could be implemented pretty easily at RTL-design-time.
Chip: I am just finishing up some work here - expect another 2 weeks. I would love to help with the Interpreter. As you know I did a faster version where the byte-codes were used as an offset into a hub 32bit table of 3 9bit vectors and 5 bits left for other things. Decoding each bytecode fast is a major point because it aids every bytecode. The other part I did was to unthread the maths routines which provided major speed. Together with some LMM mix, we could get some real speed out of the P2 interpreter. Coupled to that is the fifo used as two stacks for the variables, etc. I am sure you have considered most of this anyway, but just mentioning it in case.