Shop OBEX P1 Docs P2 Docs Learn Events
New Unscrambled P1V ROM with Faster Spin Interpreter — Parallax Forums

New Unscrambled P1V ROM with Faster Spin Interpreter

Cluso99Cluso99 Posts: 18,069
edited 2015-04-18 02:55 in Propeller 1
18th April 2015: See Post #16 for the latest files.

See Post #3 for latest files, including ROM files for both the P1 Spin Interpreter, and (my) Cluso's Faster Spin Interpreter.

Here are modified Verilog files (from the latest git files) to be updated to use unscrambled roms (also posted)...

rom_low_v028.hex (hub $8000-$BFFF)
rom_high_v028.hex (hub $C000-$FFFF)

hub.v (unscrambling removed as no longer required with the above hex files)
hub_mem.v (use new hex files)

I have removed these files - see post #3 for the latest files

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-10 18:19
    Think I zipped the files after I changed the hub_mem.v
    Line 99 should be uncommented and Line 100 should be commented out thus...
    // (* ram_init_file = "hub_rom_high.hex" *)    reg [31:0] rom_high [4095:0]; // scrambled
    (* ram_init_file = "rom_high_v028.hex" *)      reg [31:0] rom_high [4095:0]; // unscrambled 
    // (* ram_init_file = "rom_cluso_v028.hex" *)  reg [31:0] rom_high [4095:0]; // cluso interp unscr
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-11 15:57
    Here is my latest set of files...

    hub.v & hub_mem.v are Verilog files for using unscrambled roms.

    rom_low_v030.hex, rom_high_v030.hex & rom_cluso_v030.hex are unscrambled roms for lower rom $8000..$BFFF, upper rom $C000..$FFFF, and my new upper rom $C000..$FFFF using my "Faster Spin Interpreter". WARNING:My ClusoInterpreter overwrites a section of the SIN table from $EC00..$EFFF, so that the SIN table is effectively unusable!

    roms_v030_20150412.zip
    hub_v030_20150412_unscrambled.zip
  • David BetzDavid Betz Posts: 14,516
    edited 2015-04-11 17:50
    Can this faster Spin interpreter run on a real P1 or does it require modifications to P1V?
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-11 19:47
    It runs on a real P1 too although it's a little convoluted to hook it in to run. Same code I released in 2008.
    There are no code mods to the P1V other than I removed the de-scrambler, and of course a new default ROM (which steals 1KB from the SIN table).

    However, some prelim tests I have done is not showing the speed up I had calculated :(
    But there is spare space to make further improvements to the interpreter.
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-14 04:38
    Here are some results of my Faster Spin Interpreter. Unfortunately not as good as I had hoped and thought. Still there is improvement, and room to improve further.
    ROM -->		P1	P1	Cluso	Faster
    Chip		P1	P1V	P1V	
    Overhead	1248	1248	1200	3.8%
    i++		336	336	304	9.5%
    i--		336	336	304	9.5%
    ++i		336	336	320	4.8%
    --i		336	336	320	4.8%
    ~i		352	352	320	9.1%
    ~~j		352	352	320	9.1%
    i~		336	336	304	9.5%
    i~~		336	336	304	9.5%
    -i		416	416	384	7.7%
    !i		416	416	368	11.5%
    ||i		416	416	384	7.7%
    >|i		656	656	608	7.3%
    |<i		416	416	368	11.5%
    NOT j		448	448	368	17.9%
    ?i		864	864	832	3.7%
    i?		864	864	832	3.7%
    i ->= j		608	608	592	2.6%
    i <-= j		608	608	592	2.6%
    i >>= j		608	608	592	2.6%
    i <<= j		608	608	592	2.6%
    i #>= j		608	608	592	2.6%
    i <#= j		608	608	592	2.6%
    i &= j		608	608	592	2.6%
    i |= j		608	608	592	2.6%
    i ^= j		608	608	592	2.6%
    i += j		608	608	592	2.6%
    i -= j		608	608	592	2.6%
    i ~>= j		608	608	592	2.6%
    i ><= j		608	608	576	5.3%
    i AND= j	608	608	608	0.0%
    i OR= j		608	608	608	0.0%
    i *= j		1168	1168	1152	1.4%
    i **= j		1168	1168	1152	1.4%
    i /= j		1168	1168	1152	1.4%
    i //= j		1168	1168	1152	1.4%
    ^^i		1056	1056	752	28.8%
    if i == j then	704	704	688	2.3%
    if i <  j then	704	704	688	2.3%
    if i >  j then	704	704	688	2.3%
    if i <> j then	704	704	688	2.3%
    if i =< j then	704	704	688	2.3%
    if i => j then	704	704	688	2.3%
    
    I ran my ROM to hex file generator (removing the fdx output) and it ran 10% faster. Also the shift instructions that unscramble the ROM ran 10% faster.
    As you can see, I haven't tested subroutines/repeats etc, just the basic mathops.

    I was pleased to see the P1 and the P1V with P1 ROMs performed identically, as they should.
  • jmgjmg Posts: 15,173
    edited 2015-04-14 14:48
    Cluso99 wrote: »
    Here are some results of my Faster Spin Interpreter. Unfortunately not as good as I had hoped and thought. Still there is improvement, and room to improve further.
    At least in no case is it slower :)

    What is the size difference ? If you can gain room for more features, then the gains could be made there.
    eg can you add debug options and still meet speed ?

    A special case would be a Faster Spin Interpreter, with Debug tests added, and set to run at the same speed as original spin. Now, you have good Debug coverage, and the code should be identically timed ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-14 16:00
    Here are some more results based on the benchmarks created for spin v pasm v c etc over here...
    http://forums.parallax.com/showthread.php/142803-Some-multi-language-benchmarks?highlight=language+benchmarks
    	P1V 80MHz P1V 80MHz	
    	P1 Spin	Cluso	Faster%
    toggle	977664	833616	14.7%
    fibo 1	2192	2144	2.2%
    fibo 2	6288	5888	6.4%
    fibo 3	10384	9632	7.2%
    fibo 4	18576	17120	7.8%
    fibo 5	30864	28352	8.1%
    fibo 6	51344	47072	8.3%
    fibo 7	84112	77024	8.4%
    fibo 8	137360	125696	8.5%
    fibo 9	223376	204320	8.5%
    fibo 10	362640	331616	8.6%
    fibo 11	587920	537536	8.6%
    fibo 12	952464	870752	8.6%
    fibo 13	1542288	1409888	8.6%
    fibo 14	2496656	2282240	8.6%
    fibo 15	4040848	3693728	8.6%
    fft v1.0	117225712	116332496	0.8%
    fft v2.0	145591760	139424544	4.2%
    
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-14 17:04
    jmg wrote: »
    At least in no case is it slower :)

    What is the size difference ? If you can gain room for more features, then the gains could be made there.
    eg can you add debug options and still meet speed ?
    It uses an extra 1KB of ROM (currently overwrites 25% of the SIN Tables).
    There is now some small space remaining in the cog once loaded, so some other improvements can be made.

    My Debug can be added, but requires the support of hub space. It runs considerably slower because it is running the interpreter in LMM mode.
    A lot depends on the debugger and the facilities required.
    A special case would be a Faster Spin Interpreter, with Debug tests added, and set to run at the same speed as original spin. Now, you have good Debug coverage, and the code should be identically timed ?
    It's not possible to return to the same speed as the original interpreter with my version. There is too much intertwining of code in Chip's interpreter to make it fit.
  • jmgjmg Posts: 15,173
    edited 2015-04-15 03:50
    Cluso99 wrote: »
    There is now some small space remaining in the cog once loaded, so some other improvements can be made.
    One of the simplest debugs, which may suit Spin, is a compare-PC checker within the byte-reader, and it allows 1 (maybe 2?) breakpoints where the code runs just slightly slower until it hits the compare values.
    An alternative would be to always report PC into HUB, and have another COG doing a profile like trace.
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-15 04:33
    jmg wrote: »
    One of the simplest debugs, which may suit Spin, is a compare-PC checker within the byte-reader, and it allows 1 (maybe 2?) breakpoints where the code runs just slightly slower until it hits the compare values.
    An alternative would be to always report PC into HUB, and have another COG doing a profile like trace.
    My debugger does a full trace of pasm as it is executed. So it can either debug the pasm in the interpreter,or debug at the higher level of spin. It has some parameters but does not allow breakpoints. Anyway, it will have to wait as I currently don't have enough time available to revisit it.
    jazzed also did a debugger, but don't recall its features or what it was called.
  • dMajodMajo Posts: 855
    edited 2015-04-15 14:15
    Cluso99 wrote: »
    jazzed also did a debugger, but don't recall its features or what it was called.

    BMA
  • jmgjmg Posts: 15,173
    edited 2015-04-15 15:56
    dMajo wrote: »

    That looks impressive, but seems an old and surprisingly short thread for such a gem ?
    Is it still active, or does it rely on BSTC ?
  • TubularTubular Posts: 4,703
    edited 2015-04-15 17:21
    Nice work Cluso, and I like the debug thinking jmg.
  • dMajodMajo Posts: 855
    edited 2015-04-16 17:14
    jmg wrote: »
    That looks impressive, but seems an old and surprisingly short thread for such a gem ?
    Is it still active, or does it rely on BSTC ?
    AFAIK this is the latest release. The debugger thread is this

    There is an other gem by Jazzed
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-04-18 02:55
    Here is my latest working rom high version...

    It has my Faster Spin Interpreter (ClusoInterpreter) which now fits the vector table in without requiring the loss of the SIN table.
    I have rearranged the runner into 3 parts to free a space for the vector table. The booter has the mods done by ozpropdev to report
    the correct version if hub > 64KB.
    The propeller version is identified as "3".

    The rom is unscrambled. You will require the modified hub.v and hub_mem.v Verilog files and the cluso_romhi_031.hex file.
    cluso_romhi_031.hex is composed of the additional files included in the zip file.

    This is the hub high rom memory map...
    // cluso_romhi_031.hex
    // +---------------------------------------------------------------------------+
    // |  $C000..$CFFF    log table                                                |
    // +---------------------------------------------------------------------------+
    // |  $D000..$DFFF    antilog table                                            |
    // +---------------------------------------------------------------------------+
    // |  $E000..$F003    sin table                                                |
    // +---------------------------------------------------------------------------+
    // |  $F004..$F7A3    Spin Interpreter                                         |
    // +---------------------------------------------------------------------------+
    // |  $F7A4..$F7FF    runner.spin (part 2)                                     |
    // +---------------------------------------------------------------------------+
    // |  $F800..$FB97    booter.spin                                              |
    // +---------------------------------------------------------------------------+
    // |  $FB98..$FBC7    runner.spin (part 3)                                     |
    // +---------------------------------------------------------------------------+
    // |  $FBC8..$FBF7    copyright message                                        |
    // +---------------------------------------------------------------------------+
    // |  $FBF8..$FFF7    vector table for cluso interpreter                       |
    // +---------------------------------------------------------------------------+
    // |  $FFF8..$FFFF    runner.spin (part 1) includes stop/run/chipver           |
    // +---------------------------------------------------------------------------+
    

    cluso_romhi_031.zip
    hub_mem_031.zip
Sign In or Register to comment.