Propeller II update - BLOG

cgracey · 2014-02-25 15:14

Ay, ay, ay!!!

I just found the problem. It was all my fault! I had enabled every cog (thinking only cog0 was enabled) to output its hub_sel pattern to pins 7..0. Well, you know that pins from cogs get OR'd together. That's why I was seeing double or triple. So, now I know that only cog0 is going to be outputting and I can get back to figuring out why things really aren't working.

Thanks for all your help, Guys. This goose chase is over, thankfully, and rationality prevails.

Sapieha · 2014-02-25 15:22

Hi Chip.

I see You are like me --- To find problem in one place -- need starting thinking another things --- to free head from not needed info --->

And that was help from us

cgracey wrote: »

Ay, ay, ay!!!

I just found the problem. It was all my fault! I had enabled every cog (thinking only cog0 was enabled) to output its hub_sel pattern to pins 7..0. Well, you know that pins from cogs get OR'd together. That's why I was seeing double or triple. So, now I know that only cog0 is going to be outputting and I can get back to figuring out why things really aren't working.

Thanks for all your help, Guys. This goose chase is over, thankfully, and rationality prevails.

Cluso99 · 2014-02-25 16:16

Great news Chip.
You best get some sleep now as I bet you haven't had much in the last day or so!

ozpropdev · 2014-02-25 16:43

cgracey wrote: »

Ay, ay, ay!!!

I just found the problem. It was all my fault! I had enabled every cog (thinking only cog0 was enabled) to output its hub_sel pattern to pins 7..0. Well, you know that pins from cogs get OR'd together. That's why I was seeing double or triple. So, now I know that only cog0 is going to be outputting and I can get back to figuring out why things really aren't working.

Thanks for all your help, Guys. This goose chase is over, thankfully, and rationality prevails.

Looking at the positive side of this, you have a slight gain in compile time now from an upgrade of Quartus...a win right?

It has also highlighted the GIGANTIC amount of work behind the scenes to keep our little FPGA's going, Bravo Chip!

jmg · 2014-02-25 17:23

ozpropdev wrote: »

Looking at the positive side of this, you have a slight gain in compile time now from an upgrade of Quartus...a win right?

Altera were also claiming significant gains for smaller changes in designs, be interesting to see if that is real, given
they did claim gains well above what Chip actually saw too...

potatohead · 2014-02-25 17:31

Excellent!

I'm glad it's self-inflicted and that you found it Chip. Bugs like that, which aren't that way, make me very nervous...

Ariba · 2014-02-25 21:23

jmg wrote: »

Altera were also claiming significant gains for smaller changes in designs, be interesting to see if that is real, given
they did claim gains well above what Chip actually saw too...

I hope they are right, this will force Chip to only make small changes now...

Andy

cgracey · 2014-02-26 00:58

jmg wrote: »

Altera were also claiming significant gains for smaller changes in designs, be interesting to see if that is real, given
they did claim gains well above what Chip actually saw too...

Back in the saddle now. I was able to find what the problem behind the "problem" was - I had an errant logic equation qualifying the "go" signal that steps the pipeline. On initialization, it was stepping before it should have, causing some real consternation to me. It's recompiling now with constant TRACE output, so I can watch it from reset. I think it's correct now, but I want to see it work.

Quartus is not working any faster than before, it seems, for those wondering. I'm just glad it's working, as it does a complicated job.

Sapieha · 2014-02-26 01:14

Hi Chip.

Very GOOD news.

cgracey wrote: »

Back in the saddle now. I was able to find what the problem behind the "problem" was - I had an errant logic equation qualifying the "go" signal that steps the pipeline. On initialization, it was stepping before it should have, causing some real consternation to me. It's recompiling now with constant TRACE output, so I can watch it from reset. I think it's correct now, but I want to see it work.

Quartus is not working any faster than before, it seems, for those wondering. I'm just glad it's working, as it does a complicated job.

Baggers · 2014-02-26 03:10

cgracey wrote: »

Back in the saddle now. I was able to find what the problem behind the "problem" was - I had an errant logic equation qualifying the "go" signal that steps the pipeline. On initialization, it was stepping before it should have, causing some real consternation to me. It's recompiling now with constant TRACE output, so I can watch it from reset. I think it's correct now, but I want to see it work.

Quartus is not working any faster than before, it seems, for those wondering. I'm just glad it's working, as it does a complicated job.

Great news Chip, glad you sorted the problem behind the bug, nothing worse than having something like that, especially this close to the deadline

Make sure you have plenty of sleep between now and the deadline, and keep a clear and focused head

cgracey · 2014-02-26 03:23

I've got COGRUNX/COGNEWX working which launches cogs directly into hub memory. Now, I'm going to add the prefix long for COGRUN/COGNEW which will instruct the tiny cog ROM loader program how to load the cog from hub memory. After that, I need to update the hub ROM to support the new prefix long for its three programs.

ozpropdev · 2014-02-26 04:01

cgracey wrote: »

I've got COGRUNX/COGNEWX working which launches cogs directly into hub memory. Now, I'm going to add the prefix long for COGRUN/COGNEW which will instruct the tiny cog ROM loader program how to load the cog from hub memory. After that, I need to update the hub ROM to support the new prefix long for its three programs.

Hi Chip

Good to see your cruising again.
BTW: Where the "problems" you experienced relevant to the COGNEW issues that Potatohead and myself had?

Brian

cgracey · 2014-02-26 04:16

ozpropdev wrote: »

Hi Chip

Good to see your cruising again.
BTW: Where the "problems" you experienced relevant to the COGNEW issues that Potatohead and myself had?

Brian

I don't know. I need to review what problems you were having. Would you mind posting a link? I know this was just the other day, but I can't remember the details.

ozpropdev · 2014-02-26 04:42

cgracey wrote: »

I don't know. I need to review what problems you were having. Would you mind posting a link? I know this was just the other day, but I can't remember the details.

Chip

Here's the link
Funky COGNEW behaviour

Cheers
Brian

cgracey · 2014-02-26 05:20

ozpropdev wrote: »

Chip

Here's the link
Funky COGNEW behaviour

Cheers
Brian

Thanks, Brian.

cgracey · 2014-02-26 05:23

In implementing the COGRUN/COGNEW that variably loads the cog from hub according to a prefix long, I'm finding that it's more practical to do something that complicated by using COGRUNX/COGNEWX. What would you think if we kept COGRUN/COGNEW simple and just had it load up the cog and jump to $000. It would be a lot faster, too.

Seairth · 2014-02-26 05:36

cgracey wrote: »

In implementing the COGRUN/COGNEW that variably loads the cog from hub according to a prefix long, I'm finding that it's more practical to do something that complicated by using COGRUNX/COGNEWX. What would you think if we kept COGRUN/COGNEW simple and just had it load up the cog and jump to $000. It would be a lot faster, too.

If I'm understanding correctly, that should be fine. It seems like the best place to run the mini-loader (what you were going to put in the cog ROM) is in the HUB anyhow. That, and we would not be limited to just one design approach, which the cog ROM would have imposed.

cgracey · 2014-02-26 06:02

Seairth wrote: »

If I'm understanding correctly, that should be fine. It seems like the best place to run the mini-loader (what you were going to put in the cog ROM) is in the HUB anyhow. That, and we would not be limited to just one design approach, which the cog ROM would have imposed.

That's right.

Bill Henning · 2014-02-26 06:15

Makes sense to me.

(glad you squished the bug!)

cgracey wrote: »

In implementing the COGRUN/COGNEW that variably loads the cog from hub according to a prefix long, I'm finding that it's more practical to do something that complicated by using COGRUNX/COGNEWX. What would you think if we kept COGRUN/COGNEW simple and just had it load up the cog and jump to $000. It would be a lot faster, too.

cgracey · 2014-02-26 06:24

Bill Henning wrote: »

Makes sense to me.

It is done!

Next on the list is coming up with some instruction(s) to get and set the tasks' Z/C/PC and lock-pending states. This will enable single-stepping and task-switching fun.

cgracey · 2014-02-26 06:30

Question: Currently, instruction cache pre-fetching is the default mode. This will work best for single-task programs, but not well for 4-task programs, as it will disturb what might be stable caches and take more power, due to frequent and likely unnecessary background RDWIDEs. Should the default be 'pre-fetch off', and then you enable pre-fetch with an instruction if you want it? When launching a cog into hub memory via COGRUNX/COGNEWX, you probably want it on, as it could make things run 2x faster. What do you think the default should be?

ctwardell · 2014-02-26 06:41

cgracey wrote: »

Question: Currently, instruction cache pre-fetching is the default mode. This will work best for single-task programs, but not well for 4-task programs, as it will disturb what might be stable caches and take more power, due to frequent and likely unnecessary background RDWIDEs. Should the default be 'pre-fetch off', and then you enable pre-fetch with an instruction if you want it? When launching a cog into hub memory via COGRUNX/COGNEWX, you probably want it on, as it could make things run 2x faster. What do you think the default should be?

Can we have instructions for both on and off?

The reason is I can see manually controlling it in power sensitive applications. There may be some sections of code that work well with caching and we want it on and other the we know thrash badly and for those we can turn it off.

C.W.

Bill Henning · 2014-02-26 06:42

How about:

COGRUN - pre-fetching off by default (as this is most likely used for cog drivers)
COGRUNX - pre-fetching on by default (as this starts in single task hub-exec)

cgracey wrote: »

Question: Currently, instruction cache pre-fetching is the default mode. This will work best for single-task programs, but not well for 4-task programs, as it will disturb what might be stable caches and take more power, due to frequent and likely unnecessary background RDWIDEs. Should the default be 'pre-fetch off', and then you enable pre-fetch with an instruction if you want it? When launching a cog into hub memory via COGRUNX/COGNEWX, you probably want it on, as it could make things run 2x faster. What do you think the default should be?

cgracey · 2014-02-26 06:54

ctwardell wrote: »

Can we have instructions for both on and off?

The reason is I can see manually controlling it in power sensitive applications. There may be some sections of code that work well with caching and we want it on and other the we know thrash badly and for those we can turn it off.

C.W.

There are two instructions which control pre-fetch: ICACHEP (pre-fetch on) and ICACHEN (no pre-fetch).

cgracey · 2014-02-26 06:55

Bill Henning wrote: »

How about:

COGRUN - pre-fetching off by default (as this is most likely used for cog drivers)
COGRUNX - pre-fetching on by default (as this starts in single task hub-exec)

That's probably a good approach. I wish there was some compelling reason to always default it one way. I think if we had to choose, we'd have pre-fetch default to ON. You turn it off if you've got 4-way hub exec tasking or you want a maximum-sized cache for a single-task loop.

Seairth · 2014-02-26 06:58

cgracey wrote: »

Question: Currently, instruction cache pre-fetching is the default mode. This will work best for single-task programs, but not well for 4-task programs, as it will disturb what might be stable caches and take more power, due to frequent and likely unnecessary background RDWIDEs. Should the default be 'pre-fetch off', and then you enable pre-fetch with an instruction if you want it? When launching a cog into hub memory via COGRUNX/COGNEWX, you probably want it on, as it could make things run 2x faster. What do you think the default should be?

If the issue arises in multitask (2 or more?) mode, maybe just enable/disable it based on the tasking mode?

ctwardell · 2014-02-26 07:04

Seairth wrote: »

If the issue arises in multitask (2 or more?) mode, maybe just enable/disable it based on the tasking mode?

It's only an issue when the multiple tasks are using hubexec so I don't think we would always want it off when multitasking.

C.W.

cgracey · 2014-02-26 07:08

I'm thinking that keeping it the way it is (defaults to pre-fetch) is probably best. Even if you start a cog loaded from hub memory and it starts a hub exec task, it may only be one task and would want pre-fetching.

Bill Henning · 2014-02-26 07:12

Makes sense.

A cog-mode driver cog (single or multi-tasking) would not care about the pre-fetch setting

cgracey wrote: »

I'm thinking that keeping it the way it is (defaults to pre-fetch) is probably best. Even if you start a cog loaded from hub memory and it starts a hub exec task, it may only be one task and would want pre-fetching.

cgracey · 2014-02-26 07:17

I'm thinking about this task redirecting issue. It will take a number of cycles to switch a task's context, but in preemptive multitasking, how often does that need to occur? I think probably 100KHz would be plenty fast, wouldn't it? If you need communication protocols fast, then use a whole task, but for general-purpose computing where you just want to write a big program without a bunch of cooperative yields, kilohertz is below human perception. Am I missing something?

Propeller II update - BLOG

Comments