Big update for DE2-115 and DE0-Nano users w/add-on boards

David Betz · 2013-10-07 11:19

rod1963 wrote: »

Just wait until Chip is done with the P2 and it's in production.

Until then just learn how to reverse engineer what is known about the P2 and incorporate it into your ideal of what a P2 should be.

Unfortunately that's beyond my ability or time right now. Cloning something as complicated as the P2 COG is a lot different than tweaking the internals of the real one.

jmg · 2013-10-07 13:18

Heater. wrote: »

This whole open hardware at the chip level has never been done before and may need some thinking about.

That's not quite correct :
http://en.wikipedia.org/wiki/OpenSPARC
and also see the links...

I think because any FPGA will dominate the cost, and because bare FPGA is no use to anyone, this is best released as
Software + PCB Module
(which already fits parallax business model).

That prevents too much diversion and fragmentation and dilution, and ensures download and run.

Also natural on a Parallax module, is to include a Prop chip, as that will always cost less than the same thing in FPGA.
That can allow a smaller, cheaper FPGA option, but still give under-the-bonnet changes & additions.

Many users would be fine to have a Prop core they can add stuff onto, which does not require top level source.

Ken Gracey · 2013-10-07 14:45

After reading the last couple of pages I'm possibly inspired by giving away P3 cores, and having Chip be the arbiter of what goes into our design.

There's one important reason in my view that the idea could have merit: time (time to market; opportunity cost of time; time is money; etc). Parallax can't sustain an eight-year design cycle easily. By the time we finish such major projects the market has changed, new competitors arrive, faster processes may obsolete our initial targeted process, more application-specific chips are produced, etc. Next, chip design and fabrication is a very costly business; without P2 R&D we would be a very profitable company yet with a limited lifespan assuming no R&D efforts, or highly profitable with a "practical" R&D effort. With these two considerations in mind, I have encouraged and supported Chip for many years to (a) expand his team and (b) release smaller design iterations - variants, without sacrificing the "why the Propeller works" philosophy with a watered-down selection of chips.

The open P3 core proposal actually expands Chip's team and may encourage the release of more iterations, more quickly. This meets some internal goals, actually. Heck, they aren't even "goals" - they're just practical business requirements that are relevant to staying productive.

Some things to consider:

Open and popular silicon is guaranteed to be copied by Chinese, regardless of any license should it exist. We wind up in a situation (at least outside of USA) where Arduino is today: clones, copies, etc. with their pickup of a fringe business of official licensed product (I assume it's working this way, but don't know for certain).
Reliance on Education and Hobby for primary revenue. We are a winner in Education, and plan on being so for the next 30 years. We will always produce quality curriculum and programs for schools, and we find this very rewarding. Can enough profit be derived from the boards and robots? Hardware is now created by non-profits, open-source communities, and mega-businesses like Intel who could give away 50K boards. If we have no IP that's clearly our own, we'll be in this mix too....
Chip and Parallax warrant financial return from all of our work, especially if we want to continue to do R&D and fabricate.
P2 will be a financial success, and if it's big enough that we can recover our investment AND meet other business goals, then we may not rely on P3 for such high financial return, enabling more options like the ones discussed above.

Like Heater said about being open source or existing as a burdened licensed release, but rephrased in my own words "you're in or your out" when it comes to these decisions. Grey zones around licenses are confusing and costly in terms of communication clarity.

You are all encouraged to discuss this issue and we'll harvest it from time to time. For now, Parallax needs P2 finished!

David Betz · 2013-10-07 15:11

Ken Gracey wrote: »

After reading the last couple of pages I'm possibly inspired by giving away P3 cores, and having Chip be the arbiter of what goes into our design.

I'm looking forward to this! What can we do here to make sure P2 is successful?

Ken Gracey · 2013-10-07 15:23

David Betz wrote: »

I'm looking forward to this! What can we do here to make sure P2 is successful?

When I read these forums, I'm really inspired by the people who provide such important technical contributions. Everybody brings a different value - business, engineering, fabrication, marketing - and your help shines with customer support, compiler development, and SimpleIDE input. When Propeller 2 comes out of fabrication with a successful run, our challenge will be bringing it to market in a professional manner. We will need your help with documentation development, application and code, customer support, tools, compilers and hardware designs.

It is my expectation that we will integrate the efforts of others to make this a success, much more than our first time around. We've already got a few doc development tools in place to make incorporate expertise and effort from people beyond our internal team. Our team will include all of you, coordinated to make our output more uniform and seamless.

Baggers · 2013-10-07 15:52

Thanks all, for the tips,

But all to no avail I'm afraid

it's still not showing a display, pressing F10, or F11.
Cogs are running, just not getting a signal from the VGA.
I'll look at forcing 80Mhz once the cog starts, see if that helps.
I tried both Chip's graphics6.spin, and ozpropdev's spacies game, both show no results on VGA

Cheers,
Jim.

Cluso99 wrote: »

Baggers:
A few things that may help...
F11 is the compile and download
You need to use orgh $e80 and don't fill with 0's to $E80
Rom monitor now starts at $700 not $70C
Ctl-M now compiles (was Ctl-L)
Your code must now start from $E80 (I used P2load to start my program at $1000 and filled $e80-$FFF with zeros)
Currently I am not using P2Load
pnut can also use spin2 but is not fully debugged or capable

Let us know what you have to do so that I can update the sticky

potatohead · 2013-10-07 16:37

Do you have a TV cable connected? I find I get no signal, unless I connect one or the other. Pull all your RCA plugs.

Cluso99 · 2013-10-07 18:21

Ken: Your posts are truly inspirational. Never before has a company been so open with it "friends" here on the forum.

It's our (I am sure, speaking for all of us on the forum) to promote the P1 & P2 and help with testing, documents, software etc. It's our gratitude for a truly deserving company.

After the P2 is done, fleshing out how Parallax can maintain control, while opening up as much as possible, will certainly be a challenge, while at the same time preventing a chip competitor with deep pockets from taking these ideas and rolling their own, effectively hijacking the P3.

User Name · 2013-10-07 19:57

Cluso99 wrote: »

...fleshing out how Parallax can maintain control, while opening up as much as possible, will certainly be a challenge, while at the same time preventing a chip competitor with deep pockets from taking these ideas and rolling their own, effectively hijacking the P3.

Perhaps so, but I think Seairth, in post 331, makes some good points. I'd imagine that most organizations with pockets deep enough to fund a fab run, probably have their own ideas of how to maximize the return on their investment. Only when a thriving market is established would there be much of a chance of IP theft or copyright infringement, and even then it's mostly likely to be limited to China.

potatohead · 2013-10-07 20:17

I just ran some code tonight and got no signal. Had to seat the add-on board. FWIW.

Coley · 2013-10-08 14:10

Only just got around to loading the new config into the DE2, it worked just fine for me, Chip's demo was once again mesmerizing ;-)

Baggers wrote: »

Thanks all, for the tips,

But all to no avail I'm afraid it's still not showing a display, pressing F10, or F11.
Cogs are running, just not getting a signal from the VGA.
I'll look at forcing 80Mhz once the cog starts, see if that helps.
I tried both Chip's graphics6.spin, and ozpropdev's spacies game, both show no results on VGA

Cheers,
Jim.

potatohead · 2013-10-08 19:52

Could the marginal timing he mentioned be affecting Baggers DE2?

Baggers · 2013-10-09 00:35

potatohead wrote: »

Could the marginal timing he mentioned be affecting Baggers DE2?

I guess it's possible, as I got a copy of Coley's FPGA files, as his worked, to see if mine was somehow corrupt, yes, I know it wouldn't be, but you never know.

Installed and verified ok.

Compiling and uploading files to the P2 works fine.
Chip's demo starts with three cogs lit, then down to two, at which time the VGA should kick in, but doesn't

F10 or F11 on either Chip's demo, or Space Invaders wouldn't work.

The VGA cable and monitor works, as I tried another device on the same cable, it worked.

cgracey · 2013-10-09 00:50

Baggers,

I don't know what the problem is, but I'll have another update in a few days with verified timing. I've been chasing this FPGA timing issue for years, and just when I think I've nailed it, it goes south again. At this point, I can tell after a compilation whether it's going to work by checking a few critical path delays: the internal clock delay and the clock-source to output time - they need to be within a nanosecond of each other. I found a set_output_delay command that seemed to bring them in line, but in some compilations, it gets all out of whack again. It's more consistent with the command, anyway. Altera's doc's are verbose, but seem to miss lots of details.

I've been making a lot of changes. I've got the INx and OUTx separated again, like David Betz and others wanted. I wanted it, too, because it was making Spin need separate names for the same things and I got stumped a few times by PINx..

Many D,S instructions which pass data with D can now have #D for immediate values. For example, you can now do things like WRLONG #0,address. I also extended WAITVID's #D so that it's 12 bits. No more need for constants sitting in registers for video timing.

I pulled out SETF/MOVF and made a set of instructions which can move any byte to any byte, same with words. All immediate, so it's non-modal, unlike MOVF was. This is more multi-tasking friendly.

Also, the stack is now randomly addressable, as well as by SPA/SPB with PTRA/PTRB-type expressions.

There's more stuff, too, but I can't remember right now. I worked 36 hours straight twice last week.

Sapieha · 2013-10-09 00:56

Hi Chip.

Nice progress.

cgracey wrote: »

Baggers,

I don't know what the problem is, but I'll have another update in a few days with verified timing. I've been chasing this FPGA timing issue for years, and just when I think I've nailed it, it goes south again. At this point, I can tell after a compilation whether it's going to work by checking a few critical path delays: the internal clock delay and the clock-source to output time - they need to be within a nanosecond of each other. I found a set_output_delay command that seemed to bring them in line, but in some compilations, it gets all out of whack again. It's more consistent with the command, anyway. Altera's doc's are verbose, but seem to miss lots of details.

I've been making a lot of changes. I've got the INx and OUTx separated again, like David Betz and others wanted. I wanted it, too, because it was making Spin need separate names for the same things and I got stumped a few times by PINx..

Many D,S instructions which pass data with D can now have #D for immediate values. For example, you can now do things like WRLONG #0,address. I also extended WAITVID's #D so that it's 12 bits. No more need for constants sitting in registers for video timing.

I pulled out SETF/MOVF and made a set of instructions which can move any byte to any byte, same with words. All immediate, so it's non-modal, unlike MOVF was. This is more multi-tasking friendly.

Also, the stack is now randomly addressable, as well as by SPA/SPB with PTRA/PTRB-type expressions.

There's more stuff, too, but I can't remember right now. I worked 36 hours straight twice last week.

Sapieha · 2013-10-09 01:04

Hi Chip.

If I need that patch's I made them as Blocks that are placed in positions on FPGA that give shortest connections

cgracey wrote: »

Baggers,

I don't know what the problem is, but I'll have another update in a few days with verified timing. I've been chasing this FPGA timing issue for years, and just when I think I've nailed it, it goes south again. At this point, I can tell after a compilation whether it's going to work by checking a few critical path delays: the internal clock delay and the clock-source to output time - they need to be within a nanosecond of each other. I found a set_output_delay command that seemed to bring them in line, but in some compilations, it gets all out of whack again. It's more consistent with the command, anyway. Altera's doc's are verbose, but seem to miss lots of details.

I've been making a lot of changes. I've got the INx and OUTx separated again, like David Betz and others wanted. I wanted it, too, because it was making Spin need separate names for the same things and I got stumped a few times by PINx..

Many D,S instructions which pass data with D can now have #D for immediate values. For example, you can now do things like WRLONG #0,address. I also extended WAITVID's #D so that it's 12 bits. No more need for constants sitting in registers for video timing.

I pulled out SETF/MOVF and made a set of instructions which can move any byte to any byte, same with words. All immediate, so it's non-modal, unlike MOVF was. This is more multi-tasking friendly.

Also, the stack is now randomly addressable, as well as by SPA/SPB with PTRA/PTRB-type expressions.

There's more stuff, too, but I can't remember right now. I worked 36 hours straight twice last week.

Roy Eltham · 2013-10-09 01:36

Chip,
With INx and OUTx back, does that mean the last 12 registers are "special" now instead of just the last 8?

ozpropdev · 2013-10-09 02:35

@Baggers
I'm running a DE0 with the latest FPGA version and a DE2 with the previous FPGA version. The new Pnut won't work with my DE2, so I have to use
the previous version with it. Two different Pnuts for two different boards. It's caught me out a couple of times.
FYI FWIW
Cheers
Brian

Edit: Doh! My latest "Invaders" won't compile in the previous Pnut anyway! Ignore my suggestion

ozpropdev · 2013-10-09 02:44

cgracey wrote: »

Many D,S instructions which pass data with D can now have #D for immediate values. For example, you can now do things like WRLONG #0,address. I also extended WAITVID's #D so that it's 12 bits. No more need for constants sitting in registers for video timing.

I pulled out SETF/MOVF and made a set of instructions which can move any byte to any byte, same with words. All immediate, so it's non-modal, unlike MOVF was. This is more multi-tasking friendly.

Also, the stack is now randomly addressable, as well as by SPA/SPB with PTRA/PTRB-type expressions.

There's more stuff, too, but I can't remember right now. I worked 36 hours straight twice last week.

Nice work Chip!
Looking forward to coding with the new changes....
Cheers
Brian

cgracey · 2013-10-09 10:19

Roy Eltham wrote: »

Chip,
With INx and OUTx back, does that mean the last 12 registers are "special" now instead of just the last 8?

That's right. Plus there are two just below them that are INDA and INDB.

David Betz · 2013-10-09 10:33

cgracey wrote: »

I've got the INx and OUTx separated again, like David Betz and others wanted. I wanted it, too, because it was making Spin need separate names for the same things and I got stumped a few times by PINx.

Thanks Chip!!

Cluso99 · 2013-10-09 17:50

Thanks Chip. These changes sound great.

Bill Henning · 2013-10-09 18:05

Hmm... you have to catch up with your sleep!

On the other hand the new changes sound great

cgracey wrote: »

Baggers,

I don't know what the problem is, but I'll have another update in a few days with verified timing. I've been chasing this FPGA timing issue for years, and just when I think I've nailed it, it goes south again. At this point, I can tell after a compilation whether it's going to work by checking a few critical path delays: the internal clock delay and the clock-source to output time - they need to be within a nanosecond of each other. I found a set_output_delay command that seemed to bring them in line, but in some compilations, it gets all out of whack again. It's more consistent with the command, anyway. Altera's doc's are verbose, but seem to miss lots of details.

I've been making a lot of changes. I've got the INx and OUTx separated again, like David Betz and others wanted. I wanted it, too, because it was making Spin need separate names for the same things and I got stumped a few times by PINx..

Many D,S instructions which pass data with D can now have #D for immediate values. For example, you can now do things like WRLONG #0,address. I also extended WAITVID's #D so that it's 12 bits. No more need for constants sitting in registers for video timing.

I pulled out SETF/MOVF and made a set of instructions which can move any byte to any byte, same with words. All immediate, so it's non-modal, unlike MOVF was. This is more multi-tasking friendly.

Also, the stack is now randomly addressable, as well as by SPA/SPB with PTRA/PTRB-type expressions.

There's more stuff, too, but I can't remember right now. I worked 36 hours straight twice last week.

evanh · 2013-10-11 14:53

I was just mulling over pre-scalar for a bit and wondered if a base of 4 or 8 might be better. Base-4 definitely seems to be a winner at first glance. Example:

3 bits of base-2 pre-scalar has max multiplier of 128 and, say, 8 bits of scalar makes 128x256=32768 total range.
2 bits of base-4 pre-scalar has max multiplier of 64 and with 9 bits of scalar also makes 64x512=32768 total range.

This additional scalar bit offsets the extra coarseness in the pre-scalar. But, I guess the question then becomes, why do this? To save on config bits of course. But reducing the number of config bits means reducing the scalar ... which in turn means the extra coarseness shows it's face and the total range is halved.

evanh · 2013-10-11 15:02

Hmm, and 128x128 is the same total range as 64x256 ... Need to think about this some more me thinks

jmg · 2013-10-11 15:16

evanh wrote: »

But, I guess the question then becomes, why do this? To save on config bits of course. But reducing the number of config bits means reducing the scalar ... which in turn means the extra coarseness shows it's face and the total range is halved.

There are things that can be done to reduce the register-map overhead (like double writes).

Once you remove any register-space cost, that then make it a question of which is simpler ?
- if you have to implement a counter of some length anyway, the incremental cost to make all those bits loadable is quite small. (and I cannot understand anyone who puts a 16 bit timer in a 32 bit CPU )

NXP do 32 bit Counters with a 32 bit prescaler, which might seem a slight over-kill, but it gives full user control, excellent granularity, and is very easy to understand.
No need to dive into the data sheet to find what mapping that vendor chose.

evanh · 2013-10-11 19:24

I believe there is pre-scalars in use in the Prop2. Presumably they're there for a reason.

Tubular · 2013-10-13 22:08

Hi All

I picked a bad week to go away on holiday a couple of weeks ago - missed all the new update fun. Last week was about catching up, so now I'm ready to play.

Here's the code changes required to get composite NTSC displaying

...
  _clkmode = xinput
  _xinfreq = 80_000_000
...

DAT        org
        clkset        maxfreq
...
frqa_        long    384350718
maxfreq        long    %1_1_11_11_11

Next, while messing instructions about, is there any special reason for using CLKSET rather than SETCLK, like we do for setting up all the other peripherals?

A quick look at the P2 instruction set and it seems LOCKSET and CLKSET are the only exceptions to the SETxxxx rule.

potatohead · 2013-10-14 10:07

I held off on porting driver code over. Since Baggers had no signal and Chip has mentioned marginal timing, I thought it best to wait for the next update from Chip. Looks like instructions, a general purpose shifter and some timing / video related tweaks are going to be in there at a minimum.

Both the Invaders game and Chip's swirly thing rendered just fine on my DE2.

You should try it at 20Mhz. P2 is still quick enough for a basic display at that speed. At the least, modify one of the color bar samples, which take very few cycles per scan line.

Tubular · 2013-10-14 12:30

Thanks for the tips Pototohead, I'll give them a try soon

Later, I hooked a cro onto that external jumper to check the counter frequency output that should be 7.159 MHz for the video signal (gets x8'd to 57.272720 for colorburst). Sometimes I was seeing 1.7 MHz there, and so I'd put in a *4 value for the constant and that would yield a stable display for a bit.

There are other possibilities such a dodgy DE0 or the power supply driving it, or even the electronics I have hooked up to it all. I'm using a second DE0-Nano so I'll mix it up a bit and see what I can find.

EDIT: Never mind, the clkset wasn't getting called, causing it to run at 20MHz not 80MHz (hence the x4 factor).

All good now (except for the programmer's ego). Sorry for the alarm.

Big update for DE2-115 and DE0-Nano users w/add-on boards

Comments