[SOLVED] Using SETINDx with REPS, a little gotcha.

jmg · 2014-01-30 03:01

ozpropdev wrote: »
Here's my task schedule pattern. There was a typo which I corrected. Thanks
schedule		long	%%2100_1010_1010_1010   'intermittent (typo)

schedule		long	%%2010_1010_1010_1010   'stopped intermittent behavioyr
Spacer instruction still fails after typo correction, but intermittent result "flip" has ceased.

Brian

Hmm, so it was meeting
If REPS is used by a task that uses no more than every 2nd time slot, no spacers are needed.

but with the typo, it was adjacent to another slot, but not its own ? (and had 1 x 3 cycle pause).
I wonder if it was the 3 cycles, or the adjacent that caused the effect ?

When 'spaced away' from itself, and any other slot, & with all its slots evenly spaced, then it is stable on the 'velocity' model ?
(and counts the spacer as a block member, so REPS does run, but not quite 'as desired')

To me, the ideal REPS coding would be opcode (address) paced, not time-paced (velocity) but that may be harder to code.
This slot-interaction looks less than ideal.

ozpropdev · 2014-01-30 03:22

jmg wrote: »

but with the typo, it was adjacent to another slot, but not its own ?

The REPS instruction is in slot0 which explains the intermittent behaviour. The typo broke the rule of "no more than every 2nd time slot".

Tor · 2014-01-30 03:42

In my opinion, anything calling itself an assembler, on any architecture (even the i860), must provide a one-to-one mapping from assembly source code to binary opcodes. I should be able to look at the hexdump and find exactly what's in my assembly source code, nothing more and nothing less. Even if my code happens not to work (due to e.g. missing NOPs). If it doesn't do that, it's not an assembler, it's on the road to being a compiler. Which is fine, but the assembler must be an assembler. Add other tools and layers on top to make life easier but don't change the one true starting point: The assembler.
One gotcha with an "intelligent" not-really-an-assembler would be that someone who knows that there must be a nop spacer after a reps in certain situations will of course write that nop there.. does the "intelligent" assembler then add another one automatically by itself? Will it have to analyze and and scan the whole source code before it can begin to emit opcodes? That's not what an assembler should do.

-Tor

Heater. · 2014-01-30 04:00

Tor,

I agree. The one to one opcode to instruction mapping has been the way every assembler I have ever worked with has worked.

Unless of course you have a macro assembler and want to make use of macros.

The i860 assembler from Intel gave you a one to one mapping. The problems with that chip were the complexity of the architecture. With multiple pipelines and multiple instructions operating in parallel, all of which was visible to and had to be accounted for by the programmer. As I tried to show above. Exactly like the issues we are discussing with the REPS and threads on the P2 except a thousand times worse!

The guy from Intel giving us a presentation on i860 assembler programming had to admit after an our of explaining all the mind bending complexity that Intel itself could not figure out how to get peak flops out of that chip for an FFT. Another company had done so though but was not telling!

I think at that point everyone in the audience gave up. Most of them were lost and confused after the first half hour.

Tor · 2014-01-30 06:02

Heater,

Heater. wrote: »

I agree. The one to one opcode to instruction mapping has been the way every assembler I have ever worked with has worked. Unless of course you have a macro assembler and want to make use of macros.

There's no conflict there, you still have a one-to-one (assembly source code) instruction/opcode mapping - the macros aren't instructions so that's fine. Tools are good, as long as they are on top of the assembler (and that's true even if the macro capability is an assembler built-in). And assemblers can multi-pass as much as they want of course, to resolve labels and whatnot. What they shouldn't do (and here I believe several of us are in agreement) is to insert opcodes the assembly code author didn't put there (if some of the instructions are defined via macros or not doesn't really matter)

-Tor

jmg · 2014-01-30 11:12

Heater. wrote: »

I agree. The one to one opcode to instruction mapping has been the way every assembler I have ever worked with has worked.

No problems there, the better mnemonic approach I gave in #47, adheres to that rule.

It would work safely, even in changed modes, if the silicon used an opcode transit design.

I'd call the current DOCs not matching the silicon - instead of
REPS #n,#i requires 1 spacer instruction *
REPD #i requires 3 spacer instructions *

The tests above I think show it really should be
REPS #n,#i requires exactly 1 spacer clock, before Loop-Start *
REPD #i requires exactly 3 space clocks , before Loop-Start *
* Tasks sliced with greater than this slot distance are considered as giving this space, as they then wait-for-slot

( I wonder now, if the REPS loop counter counts opcodes, or clocks ? )

The better mnemonic approach can still help, as far as the silicon allows.

Users just need to engage the 'I hope I know what I am doing' switch a little more often. ( even tho this thread shows just how common typos and oops can be in the real world. )

Other solutions: If starting the REPS state engine on the next opcode (as in original docs) is too difficult, maybe some pipeline aware form of WAIT be used ?

	REPS     #2, Start_Addr, End_Addr ' Blackfin style mnemonic 
	WAITP    #1   'this would need to delay-by1 REPS in a pipeline case, to prevent early-start effect
Start_Addr
	Loop code here 
End_Addr

or, if there is no improvement possible and this has to lurk always there, then I would improve the assembler conditionals/pragmas to compensate somewhat.
Easy to do, like this:

$DEFINE  RepsWarn=NoSlice  'I am not slicing anything - default
	REPS     #2, Start_Addr, End_Addr ' Blackfin style mnemonic 
	NOP   ' ASM warns if this line is empty 
Start_Addr:   ' Checked to be = REPS+2
	Loop code here 
End_Addr
' End_Addr would simply reset RepsWarn to safest default

$DEFINE  RepsWarn=Slice_GE2  'I am slicing, with >= 2 spacing, in all cases
'legal here would be GE3 GE4 GE5...
	REPS     #2, Start_Addr, End_Addr ' Blackfin style mnemonic 
' ASM warns if anything is here
Start_Addr:   ' Checked to be = REPS+1
	Loop code here 
End_Addr
' End_Addr would simply reset RepsWarn to safest default
' REPD adds 2 to the tests, some NOPS can be mixed with some slice settings.

jmg · 2014-01-30 11:19

Tor wrote: »

What they shouldn't do (and here I believe several of us are in agreement) is to insert opcodes the assembly code author didn't put there (if some of the instructions are defined via macros or not doesn't really matter)

The Blackfin Mnemonic form in #47, has no problems meeting this 'rule' - ASM always gives expected binary, following the Author's opcodes to the letter.

Heater. · 2014-01-30 11:28

The issue of whether we are counting clocks or ops seems to be still open. Might be there is am implementation detail gone wrong or a documentation one.

Why not write you example as something like:

reps (0)
// One instruction here.
{
    // Loop code here. Number of instructions here completes the REPS operand above
}

reps (2)
// Nothing here
{
    // Loop code here.  Number of instructions here completes the REPS operand above
}

Much nicer:)

jmg · 2014-01-30 11:35

Heater. wrote: »
The issue of whether we are counting clocks or ops seems to be still open. Might be there is am implementation detail gone wrong or a documentation one.

Why not write you example as something like:
reps (0)
// One instruction here.
{
    // Loop code here. Number of instructions here completes the REPS operand above
}

reps (2)
// Nothing here
{
    // Loop code here.  Number of instructions here completes the REPS operand above
}
Much nicer:)

hehe, yes, that was my original form, but PASM may be in X86, so I use instead an easier to code Blackfin mnemonic approach. ( Adheres to 'ASM rules'.)

Of course compiler writers can do anything, but they should also include the operator pragmas, so it can check for the correct 0/1/2/3 spacer cases.

Ariba · 2014-01-30 11:39

jmg wrote: »

...
( I wonder now, if the REPS loop counter counts opcodes, or clocks ? )
....

What counts are the instructions that go into the pipeline. Not the clocks. And remember that the instructions of all tasks go into the same pipeline, so also an instruction from another task count as a spacer for REPS. That is what makes it so hard.

BTW: In one of your examples you used WAIT #3 for 3 spacers, but that is only one instruction, so only one spacer.

You can not just always add a spacer NOP. In this code:

reps #10,#1
   nop
   rdlong inda++,ptra++

you will execute 10 times the RDLONG without tasks and 10 times the NOP with tasks (and then one time the RDLONG).

But there may be a safe way to code it! :
- Add a spacer NOP after REPS
- Add an additional NOP at the end of the loop
- Set the instruction count in REPS so that it includes the last NOP.

The loops will take 1 clock more than the unsafe ones, but I think they always work.
Ozpropdevs example from post #59 will then look like that:

dummy   long  $FACE0000
        mov   myreg,dummy
        reps  #4,#5
        nop
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1
        nop

Without tasks it loops with a NOP at the end, with tasks the NOP at begin is included in the loop. but not the NOP at the end.

Andy

Heater. · 2014-01-30 11:47

Ariba,

Now that is cooking!

jmg · 2014-01-30 11:55

Ariba wrote: »
But there may be a safe way to code it! :
- Add a spacer NOP after REPS
- Add an additional NOP at the end of the loop
- Set the instruction count in REPS so that it includes the last NOP.

The loops will take 1 clock more than the unsafe ones, but I think they always work.
Ozpropdevs example from post #59 will then look like that:
dummy   long  $FACE0000
        mov   myreg,dummy
        reps  #4,#5
        nop
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#
        nop
Without tasks it loops with a NOP at the end, with tasks the NOP at begin is included in the loop. but not the NOP at the end.

Nifty. I think that will work even in Ozpropdevs typo example. (ie accidently mixed slices, which are otherwise very hard to check)

Covers REPS, but what about REPD... ? I think more NOPS make it safe ?

Looks like a strong case for NOP insertion to me

( but I can already anticipate the howls... strangely, even if the operator ASKS the assembler to do this. )

Such an approach can work in conjunction with the simple ASM checks on housekeeping & better mnemonics.

Ariba · 2014-01-30 12:06

Yes REPD will need 3 nops at begin and 3 nops at end and takes 3 clocks more, so it may be inefficient for short loops.

But REPD is not often used. I never used it so far. I think the most use case is that it allows and endless loop, if you set D to 511.

Andy

potatohead · 2014-01-30 12:12

See? Great case for a macro. To which tight hand optimization may be applied.

GAS has that capability, right? I need to read more about it. http://stackoverflow.com/questions/3127905/macros-using-gas (yes)

I'm an advocate for keeping the reference assembler as clean and lean as possible. I know we should have standards and such, but we don't and we don't due to how things are happening.

One layer up, we get other tools, and those can be anything any of us want to use or code to use and ideally share.

@Andy: Nice.

jmg · 2014-01-30 12:47

Ariba wrote: »
The loops will take 1 clock more than the unsafe ones, but I think they always work.
Ozpropdevs example from post #59 will then look like that:
dummy   long  $FACE0000
        mov   myreg,dummy
        reps  #4,#5
        nop
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1
        nop
Without tasks it loops with a NOP at the end, with tasks the NOP at begin is included in the loop. but not the NOP at the end.

Thinking some more about this, the REPS/REPD state engines could (optionally?) insert those leading and trailing NOPS, and so make themselves thread tolerant.

That saves code size, and also make the code much more edit-tolerant.

Even if the opcodes always did this, given that the loop-speed gain is still there, it may be better to have solid context tolerance, over rare cases where hand crafting might have made it finish earlier. The code size penalty is removed.
Usually the inner loop is what matters most, and that still runs full speed.

jmg · 2014-01-30 12:54

potatohead wrote: »

GAS has that capability, right? I need to read more about it. http://stackoverflow.com/questions/3127905/macros-using-gas (yes)

I'm an advocate for keeping the reference assembler as clean and lean as possible.

Sounds more like a case for GAS being the 'reference assembler'. (given being able to code-safe, trumps 'lean' every time)

I'm fine with Macros, but here you would need the label form of the mnemonic for it to work.

You could probably build a workaround using 2-3 macro-calls, but I think that merely makes a stronger case for a better mnemonic & user controls.

potatohead · 2014-01-30 13:12

You are asking for a lot of assembler development that simply does not need to happen in order to get SPIN + PASM out there and well defined. That's where we disagree.

And worse, you are conflating things, essentially implying something isn't proper until it meets some criteria you've established as important. You make a lot of absolute statements that really aren't, and I'll leave it at that.

GAS is a "standard" assembler. We had that discussion. So then, if you want to use a Propeller in a "standard" way, it's going to be there for you. SPIN + PASM isn't standard. Sorry. And it's pretty great because it isn't standard too. I'm a strong advocate for that coming to exist just as it did before with P1 so that we get the benefits we did with P1, and that means not polluting it, as already discussed.

jmg · 2014-01-30 13:27

potatohead wrote: »

You are asking for a lot of assembler development that simply does not need to happen in order to get SPIN + PASM out there and well defined. That's where we disagree.

PASM is already out there. This discussion is about how to make code context safer, and harder to break accidently.

The Assembler improvements are not complex. Tools are there to help.

potatohead · 2014-01-30 13:35

Okie Dokie

Cluso99 · 2014-01-30 14:01

The assembler should not insert the nops. A macro wwould be the preferredway. Remember, pasm coders may want to takeadvantage of these slots. Otherwise, the cpu would have autoinserted a clock delay. This is what assemblyis all about.

Bill Henning · 2014-01-30 14:16

I agree with Ray.

There is no need to "simplify" pasm, no need to auto-insert nop's to "protect" programmers. RTFM.

Good warnings in the documentation re/ different behaviour with tasking turned on is all that is needed. If someone mis-uses REP's they will find out quickly enough once the code does not work as expected.

If you want complexity, take a look at all the register descriptions and mode settings of other microcontrollers - WAY more complex than REP et al. Even worse, there are huge differences in said peripherals and registers between different members of the same processor families, yet programs are still written.

jmg · 2014-01-30 14:26

Bill Henning wrote: »

Good warnings in the documentation re/ different behaviour with tasking turned on is all that is needed. If someone mis-uses REP's they will find out quickly enough once the code does not work as expected.

But users never read all the docs, and they do not even think they are miss-using REPS, when they edit something else, and their code breaks.
That code could even be an OBEX, so they do not look there.... We've seen above, what a typo can do..

There are ways to make this tasking tolerant, and thus a whole lot more user friendly.

Given the frequent demands for 'no surprises', I would have thought this was in the no-brainer. basket ?

mindrobots · 2014-01-30 14:29

+1 Ray, Bill and Potatohead (and anybody else in that camp) - there's a lot of Propeller PASM experience talking. (and even a lot of P2 code experience already!)

I haven't assembled for profit in a long time but when I did, I wanted absolute control over the code being output. If I didn't I'd use some other language higher up the tree. If the behavior are documented and if it's really tricky some examples are provided a simple assembler is all you need - if you want to get fancy or need help, use some macros. Nothing should be automatic or generated to help you.

jmg · 2014-01-30 14:31

Cluso99 wrote: »

Otherwise, the cpu would have autoinserted a clock delay.

Some vendors have their silicon do exactly this. Microchip is one.
I wonder why they chose to do that, rather than force the user to (miss) manage it ?

mindrobots · 2014-01-30 14:35

I think if the user is writting PASM and wants to use the REPS, then they will have to look at the manual. When they do, they shoudl notice the Tip:, Note:, Warning:, Caution:, Timing Concern: or whatever the call out is for people to take pause and think about what they are doing. Here, there should be any special notes and cosiderations and examples if needed.

I don't think anyone willbe sitting down, keying in 'reps #4,#5' and hoping for the best.

Since Parallax is headed toward more open source products, jmg, this is the perfect opportunity for you to contribute your high level pasm assembler to the community!

If it catches on, cool! If it doesn't, you at least gave it your best shot!

jmg · 2014-01-30 14:43

mindrobots wrote: »

I think if the user is writting PASM and wants to use the REPS, then they will have to look at the manual. When they do, they shoudl notice the Tip:, Note:, Warning:, Caution:, Timing Concern: or whatever the call out is for people to take pause and think about what they are doing. Here, there should be any special notes and cosiderations and examples if needed.

If you read all the details above, you will see this alone is not enough.

The user may not be coding using REPS at all, (or so they think), and yet can still manage to break some library code.

All the notes in the world are no help, if they do not know they have broken someone else's REPS.

Why not design the tools + Silicon to make the chip easy to use, and remove the nasty gotcha's lurking, where practical to do so ?
I thought that was what Parallax tried to do, on the whole.

Tor · 2014-01-30 14:48

To me, finding that the assembler auto-inserts nops or anything else I didn't put there.. that would be the nasty gotcha.

-Tor

jmg · 2014-01-30 14:51

Tor wrote: »

To me, finding that the assembler auto-inserts nops or anything else I didn't put there.. that would be the nasty gotcha.

Sigh. It is actually this simple : The assembler does nothing you do not ask it to do. Period.

Dave Hein · 2014-01-30 14:55

It might be a good idea to include some form of conditional assembly in the assembler so that code can work when running stand-alone or with other tasks. Maybe something like the #define and #ifdef directives used with C. The NOPs would be added, or not depending on an assembler flag. So using the previous example, the code would look something like this.

#define MULTI_TASK
...
    
dummy   long  $FACE0000
        mov   myreg,dummy
        reps  #4,#4
#ifndef MULTI_TASK
        nop
#endif
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1
        shr   myreg,#1

Heater. · 2014-01-30 14:58

This little argument has been going on long enough.

The reality, as far as I can tell, is that Chip is writing the assembler and that will be the "reference" on account of it being the first and there being no language specification to work to.

Let's just seem how it goes.

[SOLVED] Using SETINDx with REPS, a little gotcha.

Comments