On the other hand.. instructions were also added to processor architectures with the intention to 'help' compiler writers.
Back in the days of the minicomputers, Data General implemented their MV-Series CPU on a single microcoded chip.
Unfortunately, there wasn't enough microcode space to implement all of the instructions. So, the designers chose some instructions that would be emulated by the CPU instead.
Of course, the emulation would be much slower than when implemented in microcode, but if it's for instructions that aren't used that often, it won't make a difference.
Unless you pick the wrong instructions.
Apparently the designers didn't look at the code generated by all of their compilers... and the COBOL compiler output (quite often) one of those "helpful" instructions (I believe it was a numeric-to-text formatter, but I'm not sure).
The end result was that some COBOL programs ran about 6X SLOWER on the new hardware.
I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
Super, Chip!
Do you think we'll still be able to have something that fits into a Nano or are we all destined to upgrade to a DE2-115 or drop out of the testing program?
I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
Last I heard, the uarts are staying, and SERDES will likely be an alternate mode, with shared circuitry. I am eagerly awaiting what Chip cooks up
Yes, needs more user control and Sync modes : SPI, I2S, QuadSPI (& JTAG?) with granular Baud/Length choices and 50MHz+ would be a solid base. Additional bit-level support for some USB primitives might also make it.
I am having trouble waiting for the next bitstream
I too hope that the nano's will still get an update.
That low cost board is interesting, and I also find the new Cyclone V GX Starter Kit for $179 VERY interesting. If Chip could map the HDMI output to the Prop component video mode, and the DDR2 to prop's DDR, it could be a very nice 2 cog 256KB hub no expansion board required platform.
I was recompiling all yesterday for the DE0-Nano. I had to remove more than was required to simply fit the circuitry, in order to get a high-speed fit that will run at 80MHz reliably.
For the DE0-Nano, 1 cog configuration, the following were removed:
I just need now to update the docs to cover the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me. At this point, I could write on-chip tools in PASM, without needing to get a Spin compiler working first to accommodate the large code needed.
Hopefully, by tonight I'll have the update posted.
... the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me.
Careful there, Chip, next thing you know you'll sneak in an interrupt or two!!
I was recompiling all yesterday for the DE0-Nano. I had to remove more than was required to simply fit the circuitry, in order to get a high-speed fit that will run at 80MHz reliably.
For the DE0-Nano, 1 cog configuration, the following were removed:
Shame to lose 32x32 multiply 64/32 divide SERB CTRB
How much extra space does relaxing Speed emphasis give you ?
Is Cordic not as costly as it sounds, or just harder to remove ?
How many more LE does it need for all the options - is the BEmicro board, with 12% more LEs a solution ?
I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.
The BEmicro board would accommodate a whole cog, barely.
I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.
Could be worth checking into, when the SerDes is done ( & Counters)
I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.
The BEmicro board would accommodate a whole cog, barely.
I'm curious about the compile process...
What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?
Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?
What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?
Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?
It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.
We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.
Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.
It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.
We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.
Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.
Self-hosted PASM? Perhaps I haven't given hub execution enough consideration. Until this minute, it didn't seem like my feature.
Is self-hosted PASM a likelihood?? Has it been mentioned before today? Would it be a third-party project like Spinix or an official Parallax tool? Too soon to say?
What does "self-hosted PASM" mean? Is it just a Prop assembler that runs on the Prop, or is it more like a Forth interpreter that can assemble PASM instructions?
Knowing Chip, and knowing PNut is written in x86 assembler, I suspect that he might write one in hubexec assembly code - he has been wanting self-hosted PASM and Spin for a long time.
Mind you, it really does not matter what the assembler is written in, as long as it is standard PASM syntax (instead of FORTH syntax). It would probably take less memory written in Forth.
Btw, I look forward to see your Forth running on the P2, it ought to be a blast.
Hmm... thinking about it, there is no reason why a Spin compiler could also not be written in Forth.
Porting Sphinx to the P2 also seems like a good idea.
Ahhhgg! Stop it already with the Forth thing. It gives me headache.
David,
What does "self-hosted PASM" mean?
A very good question.
Historically that means we need a simple text editor. The assembler of course. And a crude OS making all that usable and able to load and run programs.
What we have here in the P2 is a device with 8 32 bit processors running AFAP into 256KB or RAM together with video capabilities and an easy means of attaching gigabytes of storage on an SD card. Oh, and not to metion the 32MB RAM that looks like it will supplied as standard on the first dev boards.
That makes the thing orders of magnitude more powerful than the old 8 bit personal computers or even the first IBM PC.
That would suggest this is all more than possible. All we need is a couple of turbo assembler programmers, in the style of Gary Kildall (CP/M, PL/M), Leor Zolman (BDS C compiler), or indeed Chip himself (PASM).
If it's worth any ones time to do all this is another question of course.
Comments
Back in the days of the minicomputers, Data General implemented their MV-Series CPU on a single microcoded chip.
Unfortunately, there wasn't enough microcode space to implement all of the instructions. So, the designers chose some instructions that would be emulated by the CPU instead.
Of course, the emulation would be much slower than when implemented in microcode, but if it's for instructions that aren't used that often, it won't make a difference.
Unless you pick the wrong instructions.
Apparently the designers didn't look at the code generated by all of their compilers... and the COBOL compiler output (quite often) one of those "helpful" instructions (I believe it was a numeric-to-text formatter, but I'm not sure).
The end result was that some COBOL programs ran about 6X SLOWER on the new hardware.
Walter
Super, Chip!
Do you think we'll still be able to have something that fits into a Nano or are we all destined to upgrade to a DE2-115 or drop out of the testing program?
Can you describe the final configuration?
Last I heard:
- 256KB hub
- 4 lines icache with LRU (possibly increased to 8 lines)
- prefetch
- 1 line dcache
That's all correct. And it's 4 cogs on the DE2-115. I'll remove CTRB on the DE0-Nano compile to hopefully get a fit.
If it is not enough to remove CTRB, perhaps remove CORDIC?
One test I am planning is a mini-P2 network over the new high speed uarts between my DE2-115 and two of my DE0-Nano's...
I wonder if the UARTs are subject to change with the possible SERDES work, or will those stay separate?
C.W.
Will itbe possible to use even PORT C spare bits internaly?
The Nano is an obvious target, but there is also a low cost Cyclone V board with 12% more LEs
Besides the 12% extra size, the speed from a Cyclone V build would be interesting to see.
or. optimize for size may give enough, but Nano would then be slightly slower.
Remove of Cordic could make more sense, as peripheral and Counter testing will need a burst of activity.
P2 Counter docs are late arriving, so that compresses any testing time.
Yes, needs more user control and Sync modes : SPI, I2S, QuadSPI (& JTAG?) with granular Baud/Length choices and 50MHz+ would be a solid base. Additional bit-level support for some USB primitives might also make it.
I too hope that the nano's will still get an update.
That low cost board is interesting, and I also find the new Cyclone V GX Starter Kit for $179 VERY interesting. If Chip could map the HDMI output to the Prop component video mode, and the DDR2 to prop's DDR, it could be a very nice 2 cog 256KB hub no expansion board required platform.
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=830&PartNo=1
(replying to your next message)
I am REALLY looking forward to what Chip does for Serdes.
At a minimum I'd like to see as fast as possible SPI master and slave modes; I2S and QSPI would be a nice bonus.
For the DE0-Nano, 1 cog configuration, the following were removed:
32x32 multiply
64/32 divide
square root
SERB
CTRB
CTRA's function generator
I just need now to update the docs to cover the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me. At this point, I could write on-chip tools in PASM, without needing to get a Spin compiler working first to accommodate the large code needed.
Hopefully, by tonight I'll have the update posted.
Thanks for your patience, Everyone.
Why do I have a feeling that a PASM (witten in hubexec) is on the horizon?
And it is a new world for everyone...
Careful there, Chip, next thing you know you'll sneak in an interrupt or two!!
Shame to lose 32x32 multiply 64/32 divide SERB CTRB
How much extra space does relaxing Speed emphasis give you ?
Is Cordic not as costly as it sounds, or just harder to remove ?
How many more LE does it need for all the options - is the BEmicro board, with 12% more LEs a solution ?
I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.
The BEmicro board would accommodate a whole cog, barely.
Could be worth checking into, when the SerDes is done ( & Counters)
'barely' is still a fit - but it needs to still fit after the serDes is expanded.
Have you done any Cyclone V builds yet, to get a handle on the speed-change from Cyclone IV ?
I'm curious about the compile process...
What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?
Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?
It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.
We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.
Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.
Is self-hosted PASM a likelihood?? Has it been mentioned before today? Would it be a third-party project like Spinix or an official Parallax tool? Too soon to say?
Mind you, it really does not matter what the assembler is written in, as long as it is standard PASM syntax (instead of FORTH syntax). It would probably take less memory written in Forth.
Btw, I look forward to see your Forth running on the P2, it ought to be a blast.
Hmm... thinking about it, there is no reason why a Spin compiler could also not be written in Forth.
Porting Sphinx to the P2 also seems like a good idea.
David, A very good question.
Historically that means we need a simple text editor. The assembler of course. And a crude OS making all that usable and able to load and run programs.
What we have here in the P2 is a device with 8 32 bit processors running AFAP into 256KB or RAM together with video capabilities and an easy means of attaching gigabytes of storage on an SD card. Oh, and not to metion the 32MB RAM that looks like it will supplied as standard on the first dev boards.
That makes the thing orders of magnitude more powerful than the old 8 bit personal computers or even the first IBM PC.
That would suggest this is all more than possible. All we need is a couple of turbo assembler programmers, in the style of Gary Kildall (CP/M, PL/M), Leor Zolman (BDS C compiler), or indeed Chip himself (PASM).
If it's worth any ones time to do all this is another question of course.