[resolved][puzzle] singularity

kuroneko · 2009-12-15 01:27

Disclaimer: This is not a puzzle in the sense of the one(s) previously posted. I know the result but no information available to me can explain the behaviour.

The special register area (SRA) can be used for storing data and - if done carefully - can be used for code as well (if you don't mind leaving dira alone and don't require counters and/or video). The odd thing here is that while vscl ($1FF) can be written to with any value (and you'll read back the same value) it is completely ignored for code execution.

In the code fragment below I fill the SRA (excluding dira) with an ordinary add instruction (keeps counters and video off). So the expected result is $8F. However, all I get is $8E. This can be narrowed down to code not being executed at address $1FF.

Question is why? My guess would be that it somehow involves handling the address wrap-around (execution continues at $000). Sampling cnt before the jump to SRA and after the jmpret as a result of the wrap-around shows that 18 4-cycle instructions are executed (14x add, 1x nop^a, 1x nop^b, 2x jmp[noparse][[/noparse]ret]).

Thoughts?

Solution: It turns out that you can in fact execute code at $1FF provided you don't go there by normal means (jmp & Co, arriving via $1FE). All it takes is to pretend that you're not actually at $1FF while the instruction is executed. The opposite is also true, any $xxx:$000 phase jump will only execute the instruction at $000 because the PC is set to $1FF while we are at $xxx, meaning the first target is ignored.

DAT             org     0

start           jmpret  $, #setup
                wrlong  report, par
                
                cogid   cnt
                cogstop cnt

setup           mov     $1F0, inst
                mov     $1F1, inst
                mov     $1F2, inst
                mov     $1F3, inst
                mov     $1F4, inst
                mov     $1F5, inst
                mov     $1F6, #0      ' avoid dira
                mov     $1F7, inst
                mov     $1F8, inst
                mov     $1F9, inst
                mov     $1FA, inst
                mov     $1FB, inst
                mov     $1FC, inst
                mov     $1FD, inst
                mov     $1FE, inst
                mov     $1FF, inst
                jmp     #par
                
inst            add     report, #1
report          long    %1000_0000

                fit

^adira being left alone
^badd at $1FF, isn't executed but consumes 4 cycles

Post Edited (kuroneko) : 12/16/2009 11:32:23 PM GMT

jazzed · 2009-12-15 03:22

I found that if I use immediate source $1F2 (or other address) that the shadow register is accessed.
Maybe this answer is too simple? ... Edit: Oooops. Wrong question

That's some weird behavior.

Post Edited (jazzed) : 12/15/2009 5:46:30 AM GMT

MagIO2 · 2009-12-15 14:21

Maybe it's hardcoded as NOP (0) ...

When doing a cognew the first instruction cycle might look like 'do nothing' just to get the first instruction with that cycle and executing it with the second cycle. In this way cognew will set the PC to $1ff and let it do nothing but the wraparound? That was at least my 'working assumption' when writing my propeller simulator.

Maybe it's only the instruction fetch which has this kind of problem (this hardcode). Guess it can still be used as shadow register, can't it?

Post Edited (MagIO2) : 12/15/2009 2:26:31 PM GMT

Cluso99 · 2009-12-15 14:50

Not all the registers and their shaddow registers work alike. IIRC, one of the shaddow registers always reads as $0.

I used the first 4 ($1F0-$1F3) to run a LMM program loop for my spin & pasm zero footprint debugger.

However, currently I do not have time to investigate further.

I re-read you findings. All I can suggest is that the wrap-around is detected and causes an abort of the current instruction at $1FF before the writeback phase.·Execution begins at $000. I guess the pipeline flushing, which would cause a refetch of the I phase does not occur (due to your findings with the cnt).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Post Edited (Cluso99) : 12/15/2009 3:13:47 PM GMT

MagIO2 · 2009-12-15 21:03

Longer version of my first post:
After loading 512 longs into COG RAM the COG has to start execution. Goal is to run the program from beginning of COG-RAM. But as the propeller uses pipelining and the first stage of the pipeline is NOT the fetch of an instruction this can't be done straight forward by setting the PC to $000. Having the PC set to $000 means that the first instruction fetched would be that one at $001. How to solve: set the address to $1FF during start of a COG.

Problem that we still have is 'What happens in the first instruction cycle?" Because no extra logic (meaning extra die space wasted) should be there - or at least it should be as easy as possible. Solution: run the first cycle empty except of that part that fetches the next instruction and increases the PC. So, I believe that the address $1FF simply drives a signal line low (by NAND of all address bits - NAND are easy). This signal tells all stages but the parts that get the next instruction and increase the PC to do nothing.

I'd bet that the register that holds the current instruction is not initialized when starting a COG. So, what's in there is simply undefined and needs to be skipped.

Post Edited (MagIO2) : 12/15/2009 9:08:48 PM GMT

kuroneko · 2009-12-16 02:20

@all: Thanks for the input!

Ray, you wondered what a phase jump is good for. I had an idea this morning. What if I don't let execution continue (wrap) but get out of there immediately? So I came up with this:

DAT             org     0

start           jmpret  $, #setup

                or      report, #%0100_0000     ' wrap marker
response        wrlong  report, par
                
                cogid   cnt
                cogstop cnt

setup           mov     $1FF, inst
'               [s]jmp     #$1FF[/s]
                
                movi    ctra, #%0_11111_000
                mov     frqa, #2
                mov     phsa, #$1FB
                jmp     phsa                    ' jmp #$1FF continue at $002 (response)

inst            add     report, #1
report          long    %1000_0000

                fit

If you activate the jmp #$1FF at (setup + 1) then you get old-style behaviour. Reported value is $C0. Pulling a $1FF:$002 phase jump I get $81 as the result. Very nice!

Which clears up two things IMO:

execution at $1FF is in fact possible (result %x0xx_xxx1)
a PC wrap will abort the current instruction (most likely to aid cog startup as MagIO2 pointed out)

Does that make sense?

Another interesting observation. If I place an immediate jump at $1FF then old-style execution will still have this instruction aborted, while a phase jump allows the jump to proceed (even though the phase jump itself is interrupted/aborted).

Post Edited (kuroneko) : 12/16/2009 3:24:19 AM GMT

MagIO2 · 2009-12-16 05:41

Ah ... I see ... it's not the address itself, which 'aborts' the instruction execution, it's a 'carry flag' of the PC which does. In case of a wraparound PC is added by 1 and overflow occurs. In case of phase jump the PC is not calculated, but loaded and we don't have the overflow. That's even easier than having a NAND over all address-lines.

kuroneko · 2009-12-16 06:09

MagIO2 said...
In case of phase jump the PC is not calculated, but loaded and we don't have the overflow.

I think the important thing here is the value of the PC where the cog thinks it is.

jumping to $1FF normally or arriving via $1FE indicates we are now at $1FF which - by whatever means - causes what we think is an abort (nop'ify).
arriving via a phase jump (except any:$000) pretends that the current location is in fact second target - 1 (i.e. $001), so the trigger doesn't go off

Post Edited (kuroneko) : 12/16/2009 11:18:08 PM GMT

kuroneko · 2010-11-11 18:14

Continuation of discussion started here:

mpark wrote: »

I guess I'm not understanding what "next instruction" means. Say jmp #0 lives at address 10; what is the next instruction? I was thinking it would be whatever is at address 0. I'm not seeing why you say the next fetch is from 1.

It now dawns on me why you asked the question. Just having a fetch from #0 isn't that special, the actual condition is that the PC must be $1FF (which would normally lead to a fetch from #0). Serves me right for being on-line with a headache

But you're right, a jmp #0 will obviously involve a fetch from #0. Thanks for spotting that.

Dave Hein wrote: »

Include me in the confused group. What's a phased jump? Does it have to do with pipeline, such as with a JNZ where the target address is prefetched rather than the next address? So a jmp #0 is a NOP, or is the instruction at $1ff a NOP? Please explain using small words.

This puzzle is where it all started. What is a [thread=118159]phase jump[/thread]? Nothing sinister, just a jump using a phase register (phsx) as jump target. This in itself (using a register as opposed to an immediate value) is nothing special until you activate the counter. You can do fun things like executing the two adds but not the sub (base = target - 2, frqx = 1).

target  [COLOR="Red"]add temp, #1[/COLOR]
        sub temp, #4
        [COLOR="Red"]add temp, #1[/COLOR]

So to sort out the mess from the last couple of days, any instruction where the PC is $1FF is aborted (nop). This is true for location $1FF and - because it becomes a nop - the next fetch is from #0. It's easy enough to check. Just put some evil instruction there and see how it is ignored (make sure you catch re-entry at #0).

How can this be simulated (as vscl isn't a nice place to be in)? We simply have to make the cog think that it's at $1FF when it's not.

live registers like phsx are sampled during e-phase (or after D-phase, SDeR)
this means frqx has been added twice to the value we find at the beginning of the jmp phsx in phsx
in other words the jump target is base + 2*frqx
the new PC is written during the R-phase but as the counter is still active that will be base + 3*frqx
this new PC is seen when the instruction at target is executed, and if $1FF will abort the instruction

This gives us two equations with two variables.

base + 2*frqx = target
  base + 3*frqx = -1 ($1FF)    ; abort condition

  target - 2*frqx = -1 - 3*frqx
             frqx = -1 - target
             frqx = -(1 + target)

             base = target - 2*frqx
             base = target + 2*(1 + target)
             base = 3*target + 2

So for small enough target addresses we end up with this code sequence:

movi    ctrx, #%0_11111_000
        neg     frqx, #target + 1
        mov     phsx, #3*target + 2
        jmp     phsx

target  waitvid 0, 0

[resolved][puzzle] singularity

Comments