Update: Released new version! - Jump table in graphics.spin

stevenmess2004 · 2008-02-09 09:28

The jump table in graphics.spin is made up of 8 bit entries (bytes) but this can only reach half the memory. This is alright for most of the commands but I would have thought that the textmode_ subroutine was more than 255 longs from the jump. It obviously isn't since it works. Interesting example. Can you use the labels in spin so that the jump table is eliminated? So instead of passing the number of the method ie 1-15 you passed the address (using the label) of where to jump to? This would save a fair few longs in the cog and possibly allow for another function.

Steven

Edit - Out of this discussion I developed a new version of graphics.spin that does XOR and can use the ROM font and Clemens font. See last post for latest version.

Post Edited (stevenmess2004) : 4/20/2008 12:18:48 PM GMT

deSilva · 2008-02-09 10:07

(a) Yes you could provide "COG cell numbers" to a COG

action := (@asmlabel-@startOfCOG)>>2

(b) _textmode and _fill are $DA and $DD respectively, ample space upto $FF

---
Edit (ad (a)):
The simplest COG code to utilize this would be:

:cogloop
  RDLONG theDest, PAR
  TJZ theDest, #:cogloop
  JMP theDest

--
Edit 2:
However this is considered bad non-defensive programming. There is no chance to check for fatal errors any longer.
In case of a nice sequence of numbers you can very simply direct faulty parameters to a graceful exit by the handy MIN and MAX instruction!

Post Edited (deSilva) : 2/9/2008 10:31:24 AM GMT

stevenmess2004 · 2008-02-09 10:26

could you use this?

:cogLoop RDLONG theDest, PAR wz
      if_z   JMP theDest

deSilva · 2008-02-09 10:34

Of course

I did not do this as you need a JMP to :cogloop anyway.. Or don't you???

Now watch some magic!

ORG 0
    RDLONG theDest, PAR
    JMP theDest

But as you still need 3 cells it is just fancy humbug ...

---
Edit: You should never start such things...
What about:

ORG 0
    LONG 0
    RDLONG 0, PAR
    JMP 0

Edit2:

ORG 0
    RDWORD  $+2, PAR
    WRWORD zero, PAR   ' acknowledge request (wastes time but saves one cell)
    JMP #0-0

This utilizes that we KNOW that bits 9 to 17 are not used in JMP

Post Edited (deSilva) : 2/9/2008 10:58:59 AM GMT

stevenmess2004 · 2008-02-09 10:35

If you have

DAT 
someLabel someInstruction

and somewhere else

DAT
otherLabel word someLabel

then won't the word at @otherLabel hold the cog address of someLabel?

deSilva · 2008-02-09 10:47

Good thinking! So using a table as in your second box feeding the COG could look like this:

action := otherLabel

rather than the clumbsy expression I gave in my posting

action := (@someLabel - @COG)>>2

stevenmess2004 · 2008-02-09 10:57

So - for assembly drivers that mainly get called from spin (like graphics.spin) you can actually move the jump table out of the cog freeing up some space for other stuff in the cog. What is the 'long 0' for at the start of the code in your edit? Did you mean to put it after the RDLONG because of pipelining?

stevenmess2004 · 2008-02-09 11:00

With all these crossing posts I think its time for a forum feature request - Have some kind of indicator that shows when someone is editing a page. Ok, I know, it would be almost impossible but I can wish can't I

deSilva · 2008-02-09 11:13

This is bad habit of mine :-( I like to correct my spelling, and also to keep things compact...
However this is confusing with fast pace postings, which however we do not have so often....

The LONG 0 at address 0 is the intermediate place for the destination address. It was placed there to show the systematic development to the third solution. The third solution was handicapped by the interleaved instruction fetch, so I used the unused slot to add the acknowedge instruction... This is in fact the most compact form.

When it comes to bytes and words even in SPIN, my original suggestion (@.. - @..>>2) when used in a central routine as "setCommand" is shorter than a full word-table for labels... There is a break even point, around 10 entries I should say.

stevenmess2004 · 2008-02-09 11:35

Making things right is not what most people would call a bad habit, in fact I think most would call it a good habit

I guess that speed would also be a consideration since @ is calculated at runtime. Looking at the manual you should be able to do

DAT
org 0
startLabel 
...some code...
someLabel

CON
  jmp1=(@someLabel-@startLabel)>>2

From page 278 of the manual it would seem that you can do this but I get an 'undefined symbol' error when I try to compile it.

deSilva · 2008-02-09 11:55

Missing no clanger, do we ?

Using an address STATICALLY is a diferent thing than COMPUTING this address; although both usages are indicated by the @ symbol.

Although it makes terribly sense to use a DIFFERENCE of adresses, the addresses itself will be known in a later phase of the compilation only, whilst all CONSTANTS must be known in a very early phase.

So the compiler - missing the intelligence to distinguish those both situations - generally forbids to use @name in constant contexts.

But there is an exception - as described on that page 278 you quoted. You can get a RELATIVE value for the later use with the @@ operator.

However this is allowed inside the DAT sections only and only for names defined in DAT sections!! Note you can do things there as:

MOV X, #@A
'or
MOV X, #@B-@A

The other two constant contexts (CON section and constant(...) ) are more limited; funnily there is a minor difference between those as well.

I think I try to find the things I have written about this

Post Edited (deSilva) : 2/9/2008 12:01:30 PM GMT

stevenmess2004 · 2008-02-09 21:27

Okay - this code should work. It seems that the compiler evaluates CON sections and constant(...) code sections at different times.

DAT
org 0
startLabel 
'...some code...
someLabel

CON
  'jmp1=(@someLabel-@startLabel)<<2

VAR
  word jmp1
PUB someMethod
  jmp1:=constant((@someLabel-@startLabel)<<2)

deSilva · 2008-02-09 21:44

stevenmess2004 said...
It seems that the compiler evaluates CON sections and constant(...) code sections at different times.

This is what I said

But I was not fully aware of the finer differences...
But the reason is simple: the constants in CON are bound to a name, which can be used in other constant context, e.g. to allocate a vector.... Which will change the address pattern...

The constants computed by constant(....) cannot be bound to a name, they are more "passive" so to speak. Note that constant(...) is allowed in PUB/PRI only!

I always wondered how Chip sorted out all those different situations.
It is fine when you need some feature, but a hack to EXPLAIN all those ad-hoc limitations to someone

Post Edited (deSilva) : 2/9/2008 9:51:26 PM GMT

Phil Pilgrim (PhiPi) · 2008-02-09 22:09

It would be nice if there were enough crossover between ASM and Spin to allow jmp1 := #somelabel. This syntax is consistent with ASM notation and makes it clear that the value is a cog address constant relative to its preceding ORG. No futzing with hub addresses would be required.

-Phil

stevenmess2004 · 2008-02-09 22:11

So - if we write a compiler, we can put it in the CON section if the compiler is smart enough to do some calculations (using some linear algebra). That could make for some really interesting situations. Like an array automatically being sized for some amount of data.

I think that I am going to have to see what I can do with graphics.spin.

deSilva · 2008-02-09 23:27

Phil Pilgrim (PhiPi) said...
It would be nice if there were enough crossover between ASM and Spin to allow jmp1 := #somelabel

What you suggest is a new (static) operator ("#"), requesting the COG cell number rather than the content ("somelabel") or the HUB address ("@somelabel")

BTW: I think there is a common misunderstanding... The # is not a part of the operand, but a part of the opcode. It is situated at the place just in front of the operand in question for simplicity and conveniance.

Post Edited (deSilva) : 2/10/2008 7:55:50 AM GMT

stevenmess2004 · 2008-02-10 01:31

That would be nice. It shouldn't be to hard either. However, I don't think that Parallax have time to do these kind of things at the moment. When is someone going to write an open source spin compiler? Maybe I should start on it.

stevenmess2004 · 2008-02-10 02:31

Here is a version of graphics.spin modified using ideas from this thread. Its probably possible to optimize it further but this saves 11 longs in the cog. Maybe enough for someone to add one more function? I'm having trouble uploading the file and its time for lunch so I'll update this with the file in a while. Now uploaded. I think that it is also slightly faster.

Steven

Post Edited (stevenmess2004) : 2/10/2008 3:02:00 AM GMT

stevenmess2004 · 2008-02-10 07:58

Just something interesting. The spin compiler is smart enough to only allocate a word for a constant if the constant will fit in a word. If it won't fit then it allocates a long. I tried to make the graphics driver smaller by removing the <<16 in the setcommand method and putting the <<16 in the constants and it actually made the program bigger although it should be slightly faster.

It may not actually faster because I don't think that all constants are byte alligned regardless of length. (Would have to check hippy's spin documentation to be sure)

So to sum up. If you want your program to be as small as possible, it may be best to make your constants fit in a byte or word and then manipulate them to the form you want.

Steven

deSilva · 2008-02-10 08:04

Good work. There is indeed an improvement I have wished for long:
At the moment all drawing is done with with an opaque pen substituting the color. There are good reasons why "adult" graphics systems have more choice... The thing most often needed is XORing the colors, to make things undone. This will gravely simplify some flicker free updating without double buffering!

I once patched it for it (that was when I found out it was using the complete COG memory upto the last cell

) doing XOR only, but that was not exceptable..

The user interface could be through the colours, there are 4 at the moment 0,1,2,3.
Leaving some spares, an offset of 256 could mean XOR....

---
Edit:
As this is in the central drawing routine it can be done with 4 instructions I think...

Post Edited (deSilva) : 2/10/2008 8:24:16 AM GMT

deSilva · 2008-02-10 08:17

stevenmess2004 said...
The spin compiler is smart enough to only allocate a word for a constant if the constant will fit in a word.

This needs some more research (although most likely done by Hippy already..) When a "constant" is needed, an appropriate bytecode is generated followed by that constant. So we have learned there are most likely two (or three: B,W,L ?) of those codes.
Another implementation option would be to allocate "constants" in the RAM, just addressing them by indirection with static memory fetch bytecodes - the same as we do in PASM when a constants exceeds 9 bits. The great advantage is that you neither need a set of special bytecodes for constants anymore and you can re-use the same constant value from different places... Hippy will know...

stevenmess2004 · 2008-02-10 09:03

From BYTECODE.txt in this thread by hippy http://forums.parallax.com/showthread.php?p=665019

hippy said...

34 PUSH #-1

The long word $FFFFFFFF is pushed to the stack.

35 PUSH #0

The long word $00000000 is pushed to the stack.

36 PUSH #1

The long word $00000001 is pushed to the stack.

37 PUSH #kp

The single byte following the PUSH opcode indicates the vaue of a 32-bit number
to push to the stack. The lower five bits ( bits 0 to bit 4 ) of the operand
byte is used to specify a power-of-two number ( 0 = $00000001, 1 = $00000002
through to $1F = $80000000 ). If bit 5 is set, that number is decremented. If
bit 6 is set, that number is inverted.

The purpose of bit 7 is not clear and has never been found to have been set by
the author during testing so far -- It may have some significance for negative
numbers where msb's may all be 1's ?

38 PUSH #k1

The byte following the PUSH opcode is taken as the 8-lsb's of a long number to
push to the stack. The top 24-bits are zeroed.

39 PUSH #k2

The two bytes following the PUSH opcode are taken as the 16-lsb's of
a long number to push to the stack. The top 16-bits are zeroed. The 16-bit
number is stored after the PUSH opcode MSB first.

3A PUSH #k3

The three bytes following the PUSH opcode are taken as the 24-lsb's of
a long number to push to the stack. The top 8-bits are zeroed. The 24-bit
number is stored after the PUSH opcode MSB first.

3B PUSH #k4

The four bytes following the PUSH opcode are taken as a 32-bit long number
to push to the stack. The 32-bit number is stored after the PUSH opcode MSB
first.

So it looks like you could actually have byte, word, 3 byte and long. Not sure what the compiler does though. So if you constant is bigger than a word and used multiple times then it may be smaller to put something in the DAT section and use that as a constant. May also be quicker as the proper constants may need more hub accesses. Something to consider is the PUSH -1, 0 and 1 opcodes which will really speedup a lot of things.

Phil Pilgrim (PhiPi) · 2008-02-10 09:12

deSilva said...
The # is not a part of the operand, but a part of the opcode.

I think that's what a good attorney would call a "distinction without a difference".

Without regard to where, exactly, the distinguishing bit is located in the instruction, the end effect is the same.

-Phil

stevenmess2004 · 2008-02-10 09:30

deSilva said...
Good work. There is indeed an improvement I have wished for long:
At the moment all drawing is done with with an opaque pen substituting the color. There are good reasons why "adult" graphics systems have more choice... The thing most often needed is XORing the colors, to make things undone. This will gravely simplify some flicker free updating without double buffering!

I once patched it for it (that was when I found out it was using the complete COG memory upto the last cell ) doing XOR only, but that was not exceptable..

The user interface could be through the colours, there are 4 at the moment 0,1,2,3.
Leaving some spares, an offset of 256 could mean XOR....

Okay, we have 15 longs in the cog to play with. I think that we could add another 2 commands to set which to do. (commands would be setOverwrite, setXOR).
The code to draw standard pixels is

rdlong  t2,t1                   'write pixel
                        andn    t2,mask0
                        or      t2,bits0
                        wrlong  t2,t1

So to only do the XOR we can just make the andn a nop and change it back when we want to do overwrite.

Where we will have problems is that this will also need changing in the same way

'
'
' Plot wide pixel slice
'
wslice                  shl     t1,#2                   'ready long offset

                        add     base0,t1                'plot left slice
                        test    jumps,#%01      wc
        if_c            rdlong  t2,base0
        if_c            andn    t2,mask0
        if_c            or      t2,bits0
        if_c            wrlong  t2,base0

                        add     base1,t1                'plot right slice
                        test    jumps,#%10      wc
        if_c            rdlong  t2,base1
        if_c            andn    t2,mask1
        if_c            or      t2,bits1
        if_c            wrlong  t2,base1

                        sub     base0,t1                'restore bases
                        sub     base1,t1

wslice_ret              ret

So we would need three instructions + a return so thats 4 instructions for each change plus 2 longs to store the masks in. So that comes to a total of 4+4+2=10 longs which will fit.

If you agree with this I'll see if I can get it working.

deSilva · 2008-02-10 09:51

I wasn't aware of the "pixel slice optimization" :-( This makes it more complex. But "fait ton jeu"

I have a test program here somewhere for the XOR mode... but where??

@ byte code: I still cannot believe this sophistication... I think I should have needed 10% of the COG alone to implement all those constant push operations

Post Edited (deSilva) : 2/10/2008 9:57:54 AM GMT

stevenmess2004 · 2008-02-10 10:02

deSilva, If you have some code already that would be great.

deSilva · 2008-02-10 10:09

No, not for the GRAPHICS itself

It is a small test that uses the "patched" version. The patch was exactly done as you describe it: I removed the ANDN and substituted the OR by XOR.

What will be your API for doing this dynamically? Pseudocolors (256+c) ? Or a new call? Each long counts, I think

stevenmess2004 · 2008-02-10 10:38

I was thinking about a new call (may need two). So for all of them we can change
for change to XOR
1. the and to a NOP by doing an AND with a mask that sets the condition bits to 0 - needs 1 long - 1 for instruction a the mask can get passed by the calling function as an argument
2. the OR to an XOR by using a movi - needs 1 longs - needs one for the instruction the new instruction can get passed as an argument

so we need 2 + 1 return = 3 longs to change one position to XOR

to change back
3. the NOP by using an XOR with a mask that sets the condition bits to whatever is needed - needs 1 long - again the mask will get passed as an argument
4. the XOR to an OR by using a movi - needs 1 longs - needs one for the instruction the new instruction can get passed as an argument

so we need 2 + 1 return = 3 longs to change one position back

The spin code can look after doing it 3 times and making sure the correct mask for the condition bits is used.

So we only need 6 longs in the cog. That almost leaves room for something else

but someone else can do that.

deSilva · 2008-02-10 11:20

You can leave the condition bits as they are: NOP is NOP and doesn't care for them. You can also change the condition bits only (to "never").

But you have to do it twice: in wslice AND pixel...

There is no need for doing it "three times"

There is a strict order of execution in the COG...

stevenmess2004 · 2008-02-10 11:25

deSilva, there is no NOP instruction. The NOP instruction is simulated by setting all the condition bits to zero.

The three times is for the three different places. We need to change it in one place in the pixel and in two places in the wide pixel routines making a total of three. If we pass the cog address of the function as an argument from spin then we don't use any extra instructions in the assembler. I know its not as fast but we need the room.

Steven

deSilva · 2008-02-10 11:43

Right, I misunderstood!
So there is no NOP instruction

*Still flushing....*

However the "R" bit is part of the first 9 bits, so action can be avoided by NR!

Post Edited (deSilva) : 2/10/2008 11:51:57 AM GMT

Update: Released new version! - Jump table in graphics.spin

Comments