Second-level P(2)ASM namespacing

Wuerfel_21 · 2025-02-01 22:33

Imagine yourself in this place... you're writing a Spin file that contains multiple different PASM programs (either because it is a PASM-only program or it just is like that for other reasons)

DAT ' Program A
              org 0
a_entry
              mov temp,#0
              mov outa,#15
.loop
              add temp,#1
              mov outa,temp
              jmp #.loop

temp          res 1
              fit 496

DAT ' Program B
              org 0
b_entry
.loop
              rdlong temp,ptrb[0]
              qdiv temp,#3
              getqx temp
              wrlong temp,ptrb[1]
              jmp #.loop

temp          res 1
              fit 496

and oops, both of them need a temp symbol...

Currently the solution to this is to either prefix every symbol to its relevant scope, i.e. a_temp and b_temp, which is annoying and error prone (Check out MisoYume, which ends up having a ppr_tmp1,ppc_tmp1 and ppm_tmp1 just in the PPU render->composite->math pipeline) or to put the code in separate files (not possible with PASM-only programs, annoying if you do need to share some code/data between cogs).

There really ought to be a higher-level namespace system that allows each sub-program to have its own cog symbols. These of course need to be accessible from elsewhere. Flexspin already allows accessing local symbols with a colon (i.e. a_entry:loop), so that idea could be extended to namespaces like a:entry or even a:entry:loop.

Points worth discussing:

How to define a namespace? (Note here that namespaces are not singular, you may have multiple discontiguous sections of code that share the same register layout and thus the same namespace)
Compatibility with the existing language implies a "default" namespace, whose members are all globally exposed (as is currently the case for all non-local labels)
Syntactical specifics

Tagging @cgracey @ersmith @macca for discussion

macca · 2025-02-02 08:50

@Wuerfel_21 said:
Imagine yourself in this place... you're writing a Spin file that contains multiple different PASM programs (either because it is a PASM-only program or it just is like that for other reasons)

[...]

and oops, both of them need a temp symbol...

Currently the solution to this is to either prefix every symbol to its relevant scope, i.e. a_temp and b_temp, which is annoying and error prone (Check out MisoYume, which ends up having a ppr_tmp1,ppc_tmp1 and ppm_tmp1 just in the PPU render->composite->math pipeline) or to put the code in separate files (not possible with PASM-only programs, annoying if you do need to share some code/data between cogs).

With relatively simple programs, one solution may be to always use local variables (prefixed with dot) within an high-level procedure. In your example, if you change temp to .temp I think you get what you want, but this is a really simple example and may not be applicable in all situations. The most obvious disadvantage is that you can't share variables, locals get out of scope when a non-dot prefixed label is encountered.

I was thinking, in the past, about a "DAT scoped label", prefixed with, don't know colon or underscore, or some other combination. A label that is local to the DAT section, so you can have different DAT sections with the same label names. Traditional assemblers have directives for labels, like ".local name" or ".global name" (can't remember exacly, is a long time since my last assembler programming...) may be a way to define label scopes without using cumbersome prefixes.

One, relatively quick to implement, way is to have PASM-only objects, and reference to public PASM labels with the same syntax used with Spin methods (objname.label). This may be consistent with the Spin object concept.
In my compiler I can have OBJs in PASM-only programs, it just doesn't export the labels.

Wuerfel_21 · 2025-02-02 19:03

@macca said:
With relatively simple programs, one solution may be to always use local variables (prefixed with dot) within an high-level procedure. In your example, if you change temp to .temp I think you get what you want, but this is a really simple example and may not be applicable in all situations. The most obvious disadvantage is that you can't share variables, locals get out of scope when a non-dot prefixed label is encountered.

Exactly, works for the simple example, not viable for large programs where it actually matters.

One, relatively quick to implement, way is to have PASM-only objects, and reference to public PASM labels with the same syntax used with Spin methods (objname.label). This may be consistent with the Spin object concept.
In my compiler I can have OBJs in PASM-only programs, it just doesn't export the labels.

I feel like that defeats the point of PASM-only mode (manual memory layout)

macca · 2025-02-03 06:42

@Wuerfel_21 said:

One, relatively quick to implement, way is to have PASM-only objects, and reference to public PASM labels with the same syntax used with Spin methods (objname.label). This may be consistent with the Spin object concept.
In my compiler I can have OBJs in PASM-only programs, it just doesn't export the labels.

I feel like that defeats the point of PASM-only mode (manual memory layout)

Maybe it will be a bit more complicated, but is a way to split the source, even with all helps from the editor, a big source file is always a nightmare to handle (and you have to copy/paste the same modules in other projects).

What about DAT section names, like PUB/PRI method names ?

DAT ' Entry point
              org 0
start
              coginit #1, ##@vga.start
              coginit #2, ##@hdmi.start
              ...

DAT vga ' VGA Driver
              org 0
start
              ...

DAT hdmi ' HDMI Driver
              org 0
start
              ...

The section name acts as the namespace, labels are automatically prepended with the section name, anything outside the section must use the name.label syntax, inside the DAT section nothing changes.
Unnamed DATs works unchanged (no namespace prepended, or empty namespace).

May have the advantage to require just a #include statement to split the source in multiple, reusable, source modules.

VonSzarvas · 2025-02-03 09:26

@macca said:
What about DAT section names, like PUB/PRI method names ?

On the surface I rather like that approach.
It also seems like this would not impact any existing code (considering existing code wouldn't have an alias/name after the DAT statement in any compilers?).

Wuerfel_21 · 2025-02-03 13:32

Something like that would work, yea. Though it's not 100% compatible, as it's valid to start writing DAT statements immediately after the section label (you can also do this with CON, OBJ and VAR). Not sure how many people actually do this.

DAT long 0

macca · 2025-02-03 15:48

I've seen often used to set the org, less (if ever) used for instructions.

The compatibility may be increased with some rules, like section names are single word keywords, without anything after except comments, and not instruction/statements.

macca · 2025-02-09 09:04

FYI, I have experimented a bit with the DAT names, this is the result:

DAT
                org $000

                coginit #1, #@proga.entry
                coginit #2, #@progb.entry

DAT proga
                org $000
entry
                mov     a, #a
                mov     a, #a_alias
                mov     b, #b
                mov     c, #c
                ret

a_alias
a               long    0
b               res     1
c

DAT progb
                org $010
entry
                mov     a, #a
                mov     a, #a_alias
                mov     b, #b
                mov     c, #c
                ret

a_alias
a               long    0
b               res     1
c

The listing produced by the code above (the org values are to check that it picks the right label address):

00000 00000   000                                    org     $000
00000 00000   000 08 02 EC FC                        coginit #1, #@proga.entry
00004 00004   001 20 04 EC FC                        coginit #2, #@progb.entry
00008 00008   002                                    org     $000
00008 00008   000                entry               
00008 00008   000 05 0A 04 F6                        mov     a, #a
0000C 0000C   001 05 0A 04 F6                        mov     a, #a_alias
00010 00010   002 06 0C 04 F6                        mov     b, #b
00014 00014   003 07 0E 04 F6                        mov     c, #c
00018 00018   004 2D 00 64 FD                        ret
0001C 0001C   005                a_alias             
0001C 0001C   005 00 00 00 00    a                   long    0
00020 00020   006                b                   res     1
00020 00020   007                c                   
00020 00020   007                                    org     $010
00020 00020   010                entry               
00020 00020   010 15 2A 04 F6                        mov     a, #a
00024 00024   011 15 2A 04 F6                        mov     a, #a_alias
00028 00028   012 16 2C 04 F6                        mov     b, #b
0002C 0002C   013 17 2E 04 F6                        mov     c, #c
00030 00030   014 2D 00 64 FD                        ret
00034 00034   015                a_alias             
00034 00034   015 00 00 00 00    a                   long    0
00038 00038   016                b                   res     1
00038 00038   017                c

Looks good to me.

Also I forgot to have a bunch of tests with instructions on the same DAT line and are all passing, so the backward compatibility seems good.
I think PNut will take the name as a label, then choke because of the duplicated names, but if using PNut the source should not have duplicated labels anyway.

I have to fix some details, but I think it will debut in the next Spin Tools release.

Wuerfel_21 · 2025-02-09 13:09

looks good
Will need to look into making my code compile with spintools or getting it added to flexspin...

ersmith · 2025-02-10 12:32

@macca said:
I've seen often used to set the org, less (if ever) used for instructions.

The compatibility may be increased with some rules, like section names are single word keywords, without anything after except comments, and not instruction/statements.

How about adding a marker to make the name explicit? Something like:

DAT::namespace

That way there's no ambiguity.

Wuerfel_21 · 2025-02-10 12:49

Probably a good idea. The double colon seems a bit out of place, not a token that exists anywhere else.
Maybe

DAT (namespace)

macca · 2025-02-10 14:39

@ersmith said:
How about adding a marker to make the name explicit? Something like:
DAT::namespace
That way there's no ambiguity.

PASM lines have a well defined format:

[label] [condition] instruction [paramters] [effect]

A smple check for condition and instruction will solve the backward compatibility.
The only ambiguity is when there is only a label

DAT label

How many are using that ? I hope nobody...

ersmith · 2025-02-10 15:03

@macca said:
A smple check for condition and instruction will solve the backward compatibility.
The only ambiguity is when there is only a label

DAT label

How many are using that ? I hope nobody...

But why should we just "hope"? This is a new feature, so there's no reason not to introduce some kind of unique syntax marker to make sure conflict is impossible. This should also make it easier for tools like VS Code to parse.

I suggested DAT::name in analogy to C's namespaces. Ada preferred DAT (name). We could also do DAT #name or DAT ^name or any single character, really.

macca · 2025-02-10 15:21

@ersmith said:
But why should we just "hope"? This is a new feature, so there's no reason not to introduce some kind of unique syntax marker to make sure conflict is impossible. This should also make it easier for tools like VS Code to parse.

PNut uses the keyword gating to enable new keywords, you can do the same to enable DAT names and ensure 100% compatibility.

ersmith · 2025-02-10 15:44

@macca said:

@ersmith said:
But why should we just "hope"? This is a new feature, so there's no reason not to introduce some kind of unique syntax marker to make sure conflict is impossible. This should also make it easier for tools like VS Code to parse.

PNut uses the keyword gating to enable new keywords, you can do the same to enable DAT names and ensure 100% compatibility.

It's not only the compatibility that I'm trying to improve, but also the ease of parsing. Spin2 already has all kinds of exceptions and ambiguities that a parser has to work around. I'd rather not add more. Putting in a marker of some kind removes the ambiguity completely. What is your objection to this? I realize you've probably already implemented it without the marker, and you'd rather not do more work, but my preference is to get things right before making it a cross-compiler standard.

You could always leave the markerless version in Spin Tools IDE as an extension, if you'd like (with the tiny risk of backwards incompatibility).

ersmith · 2025-02-10 15:54

Ultimately I guess it will be up to @cgracey and @"Stephen Moraco" how and whether to put this in the "official" compiler, so I'd be interested in their thoughts.

macca · 2025-02-10 16:36

@ersmith said:
It's not only the compatibility that I'm trying to improve, but also the ease of parsing. Spin2 already has all kinds of exceptions and ambiguities that a parser has to work around. I'd rather not add more. Putting in a marker of some kind removes the ambiguity completely. What is your objection to this? I realize you've probably already implemented it without the marker, and you'd rather not do more work, but my preference is to get things right before making it a cross-compiler standard.

Honestly I don't understand what difficults are to parse a label... I mean, I guess you already parse the DAT line for instructions, so get the next token, if it is a label (not a condition nor an instruction) it may be a label or the namespace, get the next token, if a condition or an instruction then proceed to parameters and effects (not a namespace), otherwise it is the namespace (and throw an error if there are more keywords). Needed only for the DAT line itself.

I've found more difficulties to implement the compiler part.

You could always leave the markerless version in Spin Tools IDE as an extension, if you'd like (with the tiny risk of backwards incompatibility).

Having already implemented it is a reason, but to me, the DAT name is simple and clean, with 99.9% backward compatibility without the need for keyword gating, has some consistency with PUB/PRI methods (not that really matters). I don't like at all prefixes or other cumbersome specifiers that then are not consistent with each other.

I would like to implements PASM-only objects for this, however it is more diffcult than expected (well, some aspects of it, for example taking the address of a label, because child objects are compiled after the main object, at least in my compiler).

If you want easy parsing, we may simply add a new PASM keyword/directive, something like name or namesp (a short name for consistent formatting) to define the namespace.

DAT
                org $000

                coginit #1, #@proga.entry
                coginit #2, #@progb.entry

DAT
                org     $000
                namesp  proga ' namespace
entry
                mov     a, #a
                mov     a, #a_alias
                mov     b, #b
                mov     c, #c
                ret

a_alias
a               long    0
b               res     1
c

DAT
                org     $010
                namesp  progb ' namespace
entry
                mov     a, #a
                mov     a, #a_alias
                mov     b, #b
                mov     c, #c
                ret

a_alias
a               long    0
b               res     1
c

ersmith · 2025-02-10 23:00

@macca said:
Honestly I don't understand what difficults are to parse a label... I mean, I guess you already parse the DAT line for instructions, so get the next token, if it is a label (not a condition nor an instruction) it may be a label or the namespace, get the next token, if a condition or an instruction then proceed to parameters and effects (not a namespace), otherwise it is the namespace (and throw an error if there are more keywords). Needed only for the DAT line itself.

If the parser is written by hand it's probably easy. But if the parser is created by a tool like Antlr or Yacc, it needs an unambiguous grammar for the language, and for Spin2 that's already really hard. I don't think we should make it any harder.

If you want easy parsing, we may simply add a new PASM keyword/directive, something like name or namesp (a short name for consistent formatting) to define the namespace.

That's a great idea, it's not only compatible but can be put on the same line as the DAT like DAT namesp proga.

@cgracey any thoughts?

macca · 2025-02-11 06:45

@ersmith said:
If the parser is written by hand it's probably easy. But if the parser is created by a tool like Antlr or Yacc, it needs an unambiguous grammar for the language, and for Spin2 that's already really hard. I don't think we should make it any harder.

Oh, yes I know what you mean. I tried to use antlr in the early stages of the compiler, but mostly because of my inexperience with it (never used antlr before), I found that it was easier to code the parser by hand rather than having to deal with endless exceptions of the Spin language.

Second-level P(2)ASM namespacing

Comments