Shop Learn P1 Docs P2 Docs
Prop2Play - audio player for a P2 [0.29] - now works with P2-EC32MB - Page 3 — Parallax Forums

Prop2Play - audio player for a P2 [0.29] - now works with P2-EC32MB

13

Comments

  • roglohrogloh Posts: 4,433

    Is the text scrolling portion coming out from the PSRAM framebuffer too?

  • pik33pik33 Posts: 1,732
    edited 2022-04-08 10:22

    Yes, both of scroll fields, horizontal and vertical, are in the PSRAM. Horizontal is easy: display list does the job of changing the starting point for the line. Vertical scroll is made in the secondary buffer and is blitted every frame to the file info field. 5 miliseconds. 4bit PSRAM. The driver is now in full graphic mode, 1024x576 8bpp

  • roglohrogloh Posts: 4,433

    Wow, you're really making the most of the 4bit variant.

  • pik33pik33 Posts: 1,732
    edited 2022-04-08 10:40

    I do what i can :) A lot of asm and optimization was needed to do this. The audio driver caches 256 bytes for its every channel in the hub, preloading the next 256 bytes from PSRAM when needed. The video driver preloads the line to display 2 lines earlier, so loading time glitches are not harmful for the picture. Blitting is slower than it can be, but the cog has the low priority.
    The code is now pushed here: https://github.com/pik33/P2-retromachine/tree/main/Propeller/Tracker player psram

    I have now to do some changes I want from SIDCog. The player still doesn't show oscilloscope and bars for SID tunes and I also want them "spatialized" which needs to filter every SID channel individually. Then I don't want the main loop to exceed 680 clocks (to not reduce the SID frequency from SID/2 to SID/4. There are several possibilities to optimize this module for the player including reorganizing the SID registers and placing combined waveforms in LUT.

  • evanhevanh Posts: 13,372
    edited 2022-04-08 10:53

    @pik33 said:
    Do I need to change something else than this constant? Meanwhile I returned to 8, 7 was not stable enough.

    Nope, just that. At the bottom of the disk_init function. I'd edit the 300 MHz entry, change it to 350 MHz and replace the 0x0003_0007 with 0x0002_0006

    // Performance option (Up to 50 MHz SPI clock)
        if( tmr <= 150_000_000 )  tmout = 0x0002_0004;  // sysclock/4
        else if( tmr <= 250_000_000 )  tmout = 0x0002_0005;  // sysclock/5
        else if( tmr <= 300_000_000 )  tmout = 0x0003_0007;  // sysclock/7
        else  tmout = 0x0003_0008;  // sysclock/8
    

    PS: I had already tweaked the latest zip copy to 350 MHz up from 300 MHz. Maybe I should revert that if you don't have any joy.

    EDIT: What is the Edge Card plugged into? Is there tracks for pins P58..P61 on the carrier board? They could be slowing it down.

  • pik33pik33 Posts: 1,732
    edited 2022-04-08 10:53

    @rogloh said:
    Wow, you're really making the most of the 4bit variant.

    And now, what can be done with 32 MB 16 bit version? The bandwidth x4 - at 1024x576 8bpp we can have a compositing windows manager. I did such thing on the RPi so I know how to do this. 4bit has too low bandwidth do do the compositing and displaying in the same time. 8 bit may be enough, but I need an 8bit driver and make experiments to determine if 8-bit contraption with CS and clock wired to the next pin bank can work.

  • pik33pik33 Posts: 1,732

    EDIT: What is the Edge Card plugged into? Is there tracks for pins P58..P61 on the carrier board? They could be slowing it down.

    This is Eval. Edge (PSRAM free edition) cannot work with PSRAMs I have at the speed I want

  • pik33pik33 Posts: 1,732

    SIDs combined waveforms are now in LUT, up to 3 rdbytes (~13 clocks each) replaced with rdlut (3 clocks),

  • evanhevanh Posts: 13,372

    @pik33 said:
    This is Eval. Edge (PSRAM free edition) cannot work with PSRAMs I have at the speed I want

    sysclock/6 is working now?

  • pik33pik33 Posts: 1,732

    I am now at sysclock/8 - sysclock/6 didn't work, sysclock/7 works but it seems to be not 100% stable. As I am now developing and debugging the player, I reduced the clock to have one less potential source of errors.

  • evanhevanh Posts: 13,372

    Maybe my speed test program will run okay on your hardware at sysclock/6. That would at least explain the difference in our findings.

  • pik33pik33 Posts: 1,732
    edited 2022-04-08 18:03

    I managed to split SIDCog loop into 2 parts and move the register decoding part to the player itself.

    FlexBasic's "cpu asm" is very useful - it allowed to move the code along with its internal variables to the Basic's procedure without (a lot of) changes.

    Why I decided to do that?

    (1) Register decoding is needed once the player changes them, at 50 (up to 400) Hz and not 490 kHz. This shortens the main SIDCog loop time by over 100 cycles, now maximum processing time is less than 480 cycles
    (2) Moving the register decoder frees a lot of cog ram. It is the code itself, more than 60 longs, 16 longs of ADSR values, 7 longs of SID registers

    This means I now have about 140 longs left in this cog and about 200 clocks left of processing time.
    This should allow disconnecting SID channels and place them in the stereo field, with the stereo depth control via the player as it is done for Amiga modules (including original, standard, full mono). To disconnect the channels I have to have 3 instances of filters instead of 1. This will add several longs and a lot of processing time.
    This should also allow to fill the oscilloscope buffer, output samples for animating these color bars and individually switch SID channels on and off, and still running at 490 kHz.

  • @pik33 said:
    I managed to split SIDCog loop into 2 parts and move the register decoding part to the player itself.

    FlexBasic's "cpu asm" is very useful - it allowed to move the code along with its internal variables to the Basic's procedure without (a lot of) changes.

    Why I decided to do that?

    (1) Register decoding is needed once the player changes them, at 50 (up to 400) Hz and not 490 kHz. This shortens the main SIDCog loop time by over 100 cycles, now maximum processing time is less than 480 cycles
    (2) Moving the register decoder frees a lot of cog ram. It is the code itself, more than 60 longs, 16 longs of ADSR values, 7 longs of SID registers

    Yep. I've done that trick with OPN2Cog - the data ingested by the emulation cog is very different from the register map of a real YM2612 and the code that handles reg writes handles the translation. In particular, values needed for detune and rate scaling are pre-computed every time the relevant registers change.

  • pik33pik33 Posts: 1,732

    @evanh said:
    Maybe my speed test program will run okay on your hardware at sysclock/6. That would at least explain the difference in our findings.

    I ran the speed program. It works OK at 7 clocks. 6 clocks don't work at all here. After the revolution I am now doing with the player I will return to 7 clocks and check if it is stable with the player.

    These are results at 338 MHz and clock/7

    ( Entering terminal mode.  Press Ctrl-] or Ctrl-Z to exit. )
       clkfreq = 338000000   clkmode = 0x124a8fb
    addr1 = 0xfa34  addr2 = 0x280d4   Randfill ticks = 225070
     Written 100000 of 100000 bytes at 3420 kB/s
     Read 100000 of 100000 bytes at 5040 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x280d4   Randfill ticks = 225070
     Written 100000 of 100000 bytes at 3334 kB/s
     Read 100000 of 100000 bytes at 5039 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x280d4   Randfill ticks = 225070
     Written 100000 of 100000 bytes at 3329 kB/s
     Read 100000 of 100000 bytes at 5035 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x280d4   Randfill ticks = 225070
     Written 100000 of 100000 bytes at 3036 kB/s
     Read 100000 of 100000 bytes at 5039 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x43654   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3885 kB/s
     Read 212000 of 212000 bytes at 5216 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x43654   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3959 kB/s
     Read 212000 of 212000 bytes at 5215 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x43654   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 4011 kB/s
     Read 212000 of 212000 bytes at 5219 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x43654   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3816 kB/s
     Read 212000 of 212000 bytes at 5211 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x43654   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3920 kB/s
     Read 212000 of 212000 bytes at 5215 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x14854   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 1808 kB/s
     Read 20000 of 20000 bytes at 4802 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x14854   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 1682 kB/s
     Read 20000 of 20000 bytes at 4798 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x14854   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 1685 kB/s
     Read 20000 of 20000 bytes at 4808 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x14854   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 1885 kB/s
     Read 20000 of 20000 bytes at 4778 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x14854   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 1709 kB/s
     Read 20000 of 20000 bytes at 4813 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x10204   Randfill ticks = 4566
     Written 2000 of 2000 bytes at 279 kB/s
     Read 2000 of 2000 bytes at 2133 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x10204   Randfill ticks = 4566
     Written 2000 of 2000 bytes at 277 kB/s
     Read 2000 of 2000 bytes at 2096 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x10204   Randfill ticks = 4566
     Written 2000 of 2000 bytes at 273 kB/s
     Read 2000 of 2000 bytes at 2178 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x10204   Randfill ticks = 4566
     Written 2000 of 2000 bytes at 282 kB/s
     Read 2000 of 2000 bytes at 2166 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0x10204   Randfill ticks = 4566
     Written 2000 of 2000 bytes at 254 kB/s
     Read 2000 of 2000 bytes at 2073 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0xfafc   Randfill ticks = 518
     Written 200 of 200 bytes at 35 kB/s
     Read 200 of 200 bytes at 589 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0xfafc   Randfill ticks = 518
     Written 200 of 200 bytes at 34 kB/s
     Read 200 of 200 bytes at 626 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0xfafc   Randfill ticks = 518
     Written 200 of 200 bytes at 36 kB/s
     Read 200 of 200 bytes at 613 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0xfafc   Randfill ticks = 518
     Written 200 of 200 bytes at 35 kB/s
     Read 200 of 200 bytes at 571 kB/s  Matches!  :)
    addr1 = 0xfa34  addr2 = 0xfafc   Randfill ticks = 518
     Written 200 of 200 bytes at 35 kB/s
     Read 200 of 200 bytes at 593 kB/s  Matches!  :)
    
  • evanhevanh Posts: 13,372
    edited 2022-04-09 12:15

    Pik,
    There is a possible other adjustment that might help. The two lines below the divider code could be replaced too, ie:

    // Performance option (Up to 50 MHz SPI clock)
        if( tmr <= 150_000_000 )  tmout = 0x0002_0004;  // sysclock/4
        else if( tmr <= 250_000_000 )  tmout = 0x0002_0005;  // sysclock/5
        else if( tmr <= 350_000_000 )  tmout = 0x0002_0006;  // sysclock/6
        else  tmout = 0x0003_0008;  // sysclock/8
    
        m_txsp |= P_INVERT_B;  // falling clock + 4 tick lag + 1 tick (smartB registration)
    

    What might be happening is the clock pin's (used as input for smartB on the tx and rx smartpins), even with full I/O registration, added latency is still extending an extra tick (two instead of one) around the 330-340 MHz mark. Removing the tx pin I/O registration, as above, may help compensate.

    [doh, edited the wrong post]

  • pik33pik33 Posts: 1,732
    edited 2022-04-09 17:24

    Still 7 is the limit for my cards / Eval boards


    Meanwhile I managed to modify what was left from SIDCog to fit it to the rest of the player.
    The result: the oscilloscope, bars, volume and stereo separation controls now work for .dmp files. The loop is full... I have 684 clocks for the main SID loop and it seems (near) all was eaten. 3 independent filters eat a lot of time. A lot of optimizations were made for SIDCog, which the most important was moving combined waveforms to LUT (and create the new combined waveforms file for this purpose, samples had to be reordered for this) and rewriting amplitude computing. The emulated SID still runs at 490 kHz. I have to test this now listening to as many .dmp files as I can :) to check if there are no distortions, clcks and hangups.

  • evanhevanh Posts: 13,372
    edited 2022-04-09 18:02

    Thanks.

    I'm coming to terms with the limitations. Even the streamer will have issues but to a lesser extent. Those transition bands require some extra leeway to be built into the timing allowances.

    EDIT: Okay, and again. Figures crossed. https://forums.parallax.com/discussion/comment/1537713/#Comment_1537713

  • pik33pik33 Posts: 1,732

    Still nothing less than 7. Either these cards are limited to 50 MHz or the Eval board doesn't allow to do more. New Edges with SD slot have shorter path to the card but I don't have a new Edge to test.

  • evanhevanh Posts: 13,372
    edited 2022-04-10 01:07

    That's fine. As long as those divider settings I've now chosen all work, that's good enough. Streamer will do better once that method is implemented. The advantage of smartpins is those resources are dedicated to those pins anyway. And it allows hubexec edition. Whereas only one streamer in a cog.

    PS: You were still getting errors at 7 earlier, so it is an improvement.

  • pik33pik33 Posts: 1,732
    edited 2022-04-10 11:52

    0.26

    Playing .dmp files is now integrated with the rest of the player. Oscilloscope, bars, channel muting, volume, stereo separation now work also for .dmp. To do: make this works also for .wav (easy, this is the same audio driver as for .mod, but not done yet).
    Added a new "window" - "Status" which now shows playing parameters and a loaded file size
    The archive now contains spccog, although now it is only added and not used, this is the next format I want to add to the player

  • Wuerfel_21Wuerfel_21 Posts: 3,006
    edited 2022-04-10 12:41

    If you want to get volume bars for SPCcog, you can do that without hacking it, the channel data lives in hub variables to begin with. Though there's three variables to worry about, left/right volume and current envelope.
    Untested, but should work in theory:

    '' Get current volume for channel (zero to $7FF00)
    PUB get_current_volume(channel) : v | chbase
      chbase := @chanstate[channel*CHANNEL_SIZE]
      return (abs(byte[chbase][14*4 + 1] signx 7) + abs(byte[chbase][14*4 + 3] signx 7))*long[chbase][10]
    

    EDIT: Do note that nothing stops the actual samples from being recorded at varying volume levels, though that's a problem the MOD player has, too, I guess. Though SPC's BRR samples have 15 bit(?) dynamic range, so I guess there's less incentive to record them hot? IDK.

  • pik33pik33 Posts: 1,732

    A stupid bug introduced with cleaning up the 0.26 disabled the scrolling module file info which worked only when the module was **not ** playing (and scrolled the empty panel) . Corrected

  • pik33pik33 Posts: 1,732
    edited 2022-04-11 09:45

    .spc files now play, without any info and effects yet (to do)

    though that's a problem the MOD player has

    The bars get output samples from the mod player directly, so there is no problem. Bars and oscilloscope reacts to the main volume level in .mod. The SID is somewhat different: it outputs these 3 channel samples before they are attenuated by the main volume, so while the oscilloscope reacts to it, bars don't. I apply volume after the mixer, saving 4 clocks which was critical to do, timings are very tight there.

    .... edit: oscilloscope output code inserted into spccog. To do is ( for SPC and SID) get rid of the hardcoded pointer.

    The code looks generally like this

                setword pa,pb,#1    
                mov scptr2,scptr
                shl scptr2,#2
                add scptr2,scbase     
                wrlong pa,scptr2
                incmod scptr,##639   
    

    I added this directly after

                  wypin pa,dsp_lpin
                  wypin pb,dsp_rpin
    

    and it needs 3 cog variables to be added.

  • Wuerfel_21Wuerfel_21 Posts: 3,006
    edited 2022-04-11 09:42

    Well in that case, the decoded sample with envelope applied (dsp_output) is just the next long after the envelope level (dsp_envlvl). Still need to take the volume registers into account, ofc. You could also hack the code to output it all pre-multiplied, there's plenty of cycles.

  • pik33pik33 Posts: 1,732
    edited 2022-04-11 11:08

    The problem with SPCmeta and Basic seems to be solved. SPCmeta returns the pointer and the length, Basic's "print", when feeded with this pointer, prints the pointer instead of the string. However, my "write" function from the video driver works, so these metadata will be displayed in the "file info" panel in the near future :)
    Still fighting with meta :(

    ... it was called too fast ! It seems waitms(1000) between starting a song and calling meta fixes the problem.... I hope

    No, it didn't fixed this. :( I still cannot do anything with this object from FlexBasic.... And now It printed the data... after something about a minute of song playing.. The code I am trying to get the metadata looks like this and works OK if I get only the song title. Adding
    the game title makes this strange behavior..

    p,l=meta.getsongtitle(@spcfile)
    let p$="": for i=0 to l-1 : p$=p$+chr$(peek(p+i)) : next i
    v.write ("Song title: "): v.writeln(p$)
    
    p,l=meta.getgametitle(@spcfile)
    let p$="": for i=0 to l-1 : p$=p$+chr$(peek(p+i)) : next i
    v.write ("Game title: "): v.writeln(p$)
    

    Edit: it started to work if I added a delay between meta function calls

    It seems I have to give up and try to write the metadata decoder directly in Basic

  • Huh, there should be no correlation at all between playing the file and reading its metadata. No idea what the issue could be, so try writing it yourself ig and see how that turns out.

  • pik33pik33 Posts: 1,732
    edited 2022-04-11 14:31

    I wrote a metadata decoder myself from scratch, based on the file format specification. Not complete, there are more fields in these ID666 fields than I retrieve and display. Now , the file info panel shows song title, game title, artist name, comments, dumper name, emulator, song time and fade time. I think it's enough.

    The oscilloscope works, what is left and have to be done are these bars and main volume control.

    The module which caused problem is "loz3-31.spc". It has the song title in the main header and the game title in extended header. I thought it doesn't work at all, but it actually works, with a minute or two delay.

  • pik33pik33 Posts: 1,732

    It seems the volume getting function works. I now have to multiply these bars to get 8 of them. Is there any simple way to mute the channel?

  • Wuerfel_21Wuerfel_21 Posts: 3,006
    edited 2022-04-11 15:07

    @pik33 said:
    It seems the volume getting function works. I now have to multiply these bars to get 8 of them. Is there any simple way to mute the channel?

    No, but there's a section where channel output is mixed into the main and echo accumulators. If you skip it or force dsp_output to zero, it'll be muted. But you'll need to figure out how to get the mute flag in there. Note that if you add a new VAR, you may need to shift some hardcoded offsets about (but don't worry, it'll fail catastrophically if you don't)

                  '' Output!
                  muls dsp_output,dsp_envlvl
                  sar dsp_output,#11+1
                  mov dsp_modout,dsp_output
                  ' Mix into output
                  getword dsp_tmp2,dsp_cvol,#1 ' right volume (left is in lowword of dsp_cvol)
                  getbyte pa,dsp_pmon_non_eon,#2
                  testb pa,dsp_chan wc ' Echo on?
                  scas dsp_output,dsp_cvol
                  add dsp_lsample,0-0
                  fges dsp_lsample,dsp_min
                  fles dsp_lsample,dsp_max
            if_c  scas dsp_output,dsp_cvol
            if_c  add dsp_lecho,0-0
                  fges dsp_lecho,dsp_min
                  fles dsp_lecho,dsp_max
                  scas dsp_output,dsp_tmp2
                  add dsp_rsample,0-0
                  fges dsp_rsample,dsp_min
                  fles dsp_rsample,dsp_max
            if_c  scas dsp_output,dsp_tmp2
            if_c  add dsp_recho,0-0
                  fges dsp_recho,dsp_min
                  fles dsp_recho,dsp_max
    
  • pik33pik33 Posts: 1,732
    edited 2022-04-11 18:00

    Ok, now I know where to add the channel switchers. :) But now, to make this format partially completed, I will output 2 bars (L/R as in for .wav) , add the main volume and disable pan and channel control. This is because I now want a 6502.

    Having a 6502 can enable real SID files to play. And there is another sound chip, not yet done - Pokey. There are a lot of Pokey songs available, but they also need a 6502.

    There is Macca's 6502 implementation which can be a good starting point. Illegal instructions have to be added, at least these most popular, LAX is everywhere in these music files. Pokey has to be written. I think it should be not hard to do.

Sign In or Register to comment.