Propeller II update - BLOG

wmosscrop · 2014-01-21 10:42

Tor wrote: »

On the other hand.. instructions were also added to processor architectures with the intention to 'help' compiler writers.

Back in the days of the minicomputers, Data General implemented their MV-Series CPU on a single microcoded chip.

Unfortunately, there wasn't enough microcode space to implement all of the instructions. So, the designers chose some instructions that would be emulated by the CPU instead.

Of course, the emulation would be much slower than when implemented in microcode, but if it's for instructions that aren't used that often, it won't make a difference.

Unless you pick the wrong instructions.

Apparently the designers didn't look at the code generated by all of their compilers... and the COBOL compiler output (quite often) one of those "helpful" instructions (I believe it was a numeric-to-text formatter, but I'm not sure).

The end result was that some COBOL programs ran about 6X SLOWER on the new hardware.

Walter

cgracey · 2014-01-22 10:32

I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.

mindrobots · 2014-01-22 10:40

cgracey wrote: »

I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.

Super, Chip!

Do you think we'll still be able to have something that fits into a Nano or are we all destined to upgrade to a DE2-115 or drop out of the testing program?

Bill Henning · 2014-01-22 10:41

Thank you for the update - I have dusted off my DE2-115 in preparation for playing with hubexec

Can you describe the final configuration?

Last I heard:

- 256KB hub
- 4 lines icache with LRU (possibly increased to 8 lines)
- prefetch
- 1 line dcache

cgracey wrote: »

I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.

cgracey · 2014-01-22 11:18

Bill Henning wrote: »

Thank you for the update - I have dusted off my DE2-115 in preparation for playing with hubexec

Can you describe the final configuration?

Last I heard:

- 256KB hub
- 4 lines icache with LRU (possibly increased to 8 lines)
- prefetch
- 1 line dcache

That's all correct. And it's 4 cogs on the DE2-115. I'll remove CTRB on the DE0-Nano compile to hopefully get a fit.

Bill Henning · 2014-01-22 11:29

It would be really nice if it fits in the Nano.

If it is not enough to remove CTRB, perhaps remove CORDIC?

One test I am planning is a mini-P2 network over the new high speed uarts between my DE2-115 and two of my DE0-Nano's...

cgracey wrote: »

That's all correct. And it's 4 cogs on the DE2-115. I'll remove CTRB on the DE0-Nano compile to hopefully get a fit.

ctwardell · 2014-01-22 11:32

Bill Henning wrote: »

It would be really nice if it fits in the Nano.

If it is not enough to remove CTRB, perhaps remove CORDIC?

One test I am planning is a mini-P2 network over the new high speed uarts between my DE2-115 and two of my DE0-Nano's...

I wonder if the UARTs are subject to change with the possible SERDES work, or will those stay separate?

C.W.

Sapieha · 2014-01-22 11:41

Hi Chip.

Will itbe possible to use even PORT C spare bits internaly?

cgracey wrote: »

I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.

Bill Henning · 2014-01-22 11:42

Last I heard, the uarts are staying, and SERDES will likely be an alternate mode, with shared circuitry. I am eagerly awaiting what Chip cooks up

ctwardell wrote: »

I wonder if the UARTs are subject to change with the possible SERDES work, or will those stay separate?

C.W.

jmg · 2014-01-22 11:42

Bill Henning wrote: »

It would be really nice if it fits in the Nano.

The Nano is an obvious target, but there is also a low cost Cyclone V board with 12% more LEs

Besides the 12% extra size, the speed from a Cyclone V build would be interesting to see.

or. optimize for size may give enough, but Nano would then be slightly slower.

Bill Henning wrote: »

If it is not enough to remove CTRB, perhaps remove CORDIC?

Remove of Cordic could make more sense, as peripheral and Counter testing will need a burst of activity.

P2 Counter docs are late arriving, so that compresses any testing time.

jmg · 2014-01-22 11:47

Bill Henning wrote: »

Last I heard, the uarts are staying, and SERDES will likely be an alternate mode, with shared circuitry. I am eagerly awaiting what Chip cooks up

Yes, needs more user control and Sync modes : SPI, I2S, QuadSPI (& JTAG?) with granular Baud/Length choices and 50MHz+ would be a solid base. Additional bit-level support for some USB primitives might also make it.

Bill Henning · 2014-01-24 07:43

I am having trouble waiting for the next bitstream

I too hope that the nano's will still get an update.

That low cost board is interesting, and I also find the new Cyclone V GX Starter Kit for $179 VERY interesting. If Chip could map the HDMI output to the Prop component video mode, and the DDR2 to prop's DDR, it could be a very nice 2 cog 256KB hub no expansion board required platform.

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=830&PartNo=1

(replying to your next message)

I am REALLY looking forward to what Chip does for Serdes.

At a minimum I'd like to see as fast as possible SPI master and slave modes; I2S and QSPI would be a nice bonus.

jmg wrote: »

The Nano is an obvious target, but there is also a low cost Cyclone V board with 12% more LEs

Besides the 12% extra size, the speed from a Cyclone V build would be interesting to see.

or. optimize for size may give enough, but Nano would then be slightly slower.

Remove of Cordic could make more sense, as peripheral and Counter testing will need a burst of activity.

P2 Counter docs are late arriving, so that compresses any testing time.

cgracey · 2014-01-24 14:58

I was recompiling all yesterday for the DE0-Nano. I had to remove more than was required to simply fit the circuitry, in order to get a high-speed fit that will run at 80MHz reliably.

For the DE0-Nano, 1 cog configuration, the following were removed:

32x32 multiply
64/32 divide
square root
SERB
CTRB
CTRA's function generator

I just need now to update the docs to cover the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me. At this point, I could write on-chip tools in PASM, without needing to get a Spin compiler working first to accommodate the large code needed.

Hopefully, by tonight I'll have the update posted.

Thanks for your patience, Everyone.

Bill Henning · 2014-01-24 15:01

Exellent news!

Why do I have a feeling that a PASM (witten in hubexec) is on the horizon?

And it is a new world for everyone...

mindrobots · 2014-01-24 15:03

Great news! At least something will run in our Nanos!

cgracey wrote: »

... the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me.

Careful there, Chip, next thing you know you'll sneak in an interrupt or two!!

jmg · 2014-01-24 15:33

cgracey wrote: »

I was recompiling all yesterday for the DE0-Nano. I had to remove more than was required to simply fit the circuitry, in order to get a high-speed fit that will run at 80MHz reliably.

For the DE0-Nano, 1 cog configuration, the following were removed:

32x32 multiply
64/32 divide
square root
SERB
CTRB
CTRA's function generator

Shame to lose 32x32 multiply 64/32 divide SERB CTRB

How much extra space does relaxing Speed emphasis give you ?
Is Cordic not as costly as it sounds, or just harder to remove ?

How many more LE does it need for all the options - is the BEmicro board, with 12% more LEs a solution ?

cgracey · 2014-01-24 15:51

jmg wrote: »

Shame to lose 32x32 multiply 64/32 divide SERB CTRB

How much extra space does relaxing Speed emphasis give you ?
Is Cordic not as costly as it sounds, or just harder to remove ?

How many more LE does it need for all the options - is the BEmicro board, with 12% more LEs a solution ?

I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.

The BEmicro board would accommodate a whole cog, barely.

rjo__ · 2014-01-24 15:57

So, so good:)

Heater. · 2014-01-24 16:02

Fantastic. Hub exec sounds great. Yep, I can see a self hosted PASM on the horizon.

jmg · 2014-01-24 17:22

cgracey wrote: »

I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.

Could be worth checking into, when the SerDes is done ( & Counters)

cgracey wrote: »

I
The BEmicro board would accommodate a whole cog, barely.

'barely' is still a fit

- but it needs to still fit after the serDes is expanded.

Have you done any Cyclone V builds yet, to get a handle on the speed-change from Cyclone IV ?

Baggers · 2014-01-25 04:52

That's awesome news Chip, fun times ahead

Seairth · 2014-01-25 05:34

cgracey wrote: »

I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.

The BEmicro board would accommodate a whole cog, barely.

I'm curious about the compile process...

What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?

Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?

cgracey · 2014-01-25 06:35

Seairth wrote: »

I'm curious about the compile process...

What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?

Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?

It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.

We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.

Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.

Bill Henning · 2014-01-25 06:43

No worries... it is well worth the wait

cgracey wrote: »

It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.

We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.

Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.

potatohead · 2014-01-25 07:38

Given the scope of the change, I think the time elapsed so far is nothing short of amazing. No worries here. It takes the time it takes.

User Name · 2014-01-25 07:56

Self-hosted PASM? Perhaps I haven't given hub execution enough consideration. Until this minute, it didn't seem like my feature.

Is self-hosted PASM a likelihood?? Has it been mentioned before today? Would it be a third-party project like Spinix or an official Parallax tool? Too soon to say?

Dave Hein · 2014-01-25 08:04

What does "self-hosted PASM" mean? Is it just a Prop assembler that runs on the Prop, or is it more like a Forth interpreter that can assemble PASM instructions?

Bill Henning · 2014-01-25 08:16

Knowing Chip, and knowing PNut is written in x86 assembler, I suspect that he might write one in hubexec assembly code - he has been wanting self-hosted PASM and Spin for a long time.

Mind you, it really does not matter what the assembler is written in, as long as it is standard PASM syntax (instead of FORTH syntax). It would probably take less memory written in Forth.

Btw, I look forward to see your Forth running on the P2, it ought to be a blast.

Hmm... thinking about it, there is no reason why a Spin compiler could also not be written in Forth.

Porting Sphinx to the P2 also seems like a good idea.

potatohead · 2014-01-25 09:20

I think that is precisely what Chip will do with Hubex.

Heater. · 2014-01-25 09:24

Ahhhgg! Stop it already with the Forth thing. It gives me headache.

David,

What does "self-hosted PASM" mean?

A very good question.

Historically that means we need a simple text editor. The assembler of course. And a crude OS making all that usable and able to load and run programs.

What we have here in the P2 is a device with 8 32 bit processors running AFAP into 256KB or RAM together with video capabilities and an easy means of attaching gigabytes of storage on an SD card. Oh, and not to metion the 32MB RAM that looks like it will supplied as standard on the first dev boards.

That makes the thing orders of magnitude more powerful than the old 8 bit personal computers or even the first IBM PC.

That would suggest this is all more than possible. All we need is a couple of turbo assembler programmers, in the style of Gary Kildall (CP/M, PL/M), Leor Zolman (BDS C compiler), or indeed Chip himself (PASM).

If it's worth any ones time to do all this is another question of course.

Propeller II update - BLOG

Comments