Prop2 FPGA files!!! - Updated 10 December 2017 - Version 28

1101102103104106

Comments

  • We have interrupts for that now.

    :D
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • potatohead wrote: »
    We have interrupts for that now.
    True, and those have similar problems with same instructions.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • potatohead wrote: »
    We have interrupts for that now.

    :D

    And, somewhat related (i.e. RESIx/RETIx), we have CALLD, which can be used for cooperative multitasking.
  • cgraceycgracey Posts: 8,343
    edited December 2 Vote Up0Vote Down
    Over the last several days, I've been consolidating all instruction decoding to the cycle before the two cycles that the instructions actually execute in.

    This has two advantages:

    1) It actually saves logic, since replica logic doesn't exist at two different pipeline stages.
    2) It makes things go faster.

    Prop2-Hot worked this way, but I had abandoned this in the new design, since it requires a flipflop per decode. I guess the new design drifted to the point where there were few enough decodes that flops became more efficient than extra logic.

    This change caused Fmax, for the 8-cog/64-smartpin Cyclone V A9 boards to go from 84.0 MHz to 89.6 MHz. That's a 6.7% speed increase that should translate straight into the silicon Fmax.

    And look at the slack histogram on the FPGA. There are just a few dangling paths that are keeping the FPGA from reaching 100 MHz. The ASIC tools will be able to tuck these in a lot tighter.

    I will be getting a v28 out soon and update the documentation accordingly.

    One other thing... I changed the memory mapping slightly so that the last 16KB of hub RAM always appears at both it's natural location and at $FC000..$FFFFF. The write-protect mechanism works at both of the last 16KB address ranges. The debug interrupt jumps are always only accessible at the end of the 1MB map, though, and they are subject to the write-protect mechanism. This will let people use the memory more naturally if they are not caring about fixed code at the top of the 1MB hub memory map.

    750 x 1377 - 15K
  • cgracey wrote: »
    Over the last several days, I've been consolidating all instruction decoding to the cycle before the two cycles that the instructions actually execute in.

    This has two advantages:

    1) It actually saves logic, since replica logic doesn't exist at two different pipeline stages.
    2) It makes things go faster.

    Prop2-Hot worked this way, but I had abandoned this in the new design, since it requires a flipflop per decode. I guess the new design drifted to the point where there were few enough decodes that flops became more efficient than extra logic.

    This change caused Fmax, for the 8-cog/64-smartpin Cyclone V A9 boards to go from 84.0 MHz to 89.6 MHz. That's a 6.7% speed increase that should translate straight into the silicon Fmax.

    And look at the slack histogram on the FPGA. There are just a few dangling paths that are keeping the FPGA from reaching 100 MHz. The ASIC tools will be able to tuck these in a lot tighter.

    I will be getting a v28 out soon and update the documentation accordingly.

    One other thing... I changed the memory mapping slightly so that the last 16KB of hub RAM always appears at both it's natural location and at $FC000..$FFFFF. The write-protect mechanism works at both of the last 16KB address ranges. The debug interrupt jumps are always only accessible at the end of the 1MB map, though, and they are subject to the write-protect mechanism. This will let people use the memory more naturally if they are not caring about fixed code at the top of the 1MB hub memory map.
    The time tweets is great news.

    This Hub mapping seems a much better way to me. Thanks for this.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • I must have forgotten how to use LOC...

    Have some code that was working fine with this
    loc       ptra,#@OV965X_REGS_QVGA
    

    But then, I removed some debugging code and it stopped working...
    Replaced with this
    mov       ptra,##@OV965X_REGS_QVGA
    
    and it works again...

    The label, OV965X_REGS_QVGA, is around $400 in HUB
    Prop Info and Apps: http://www.rayslogic.com/
  • Good idea to have the ROM area mirrored, this will make any loading more easy since it is a continuous block of RAM to load.

    I still do not get the importance of treating the debug vectors different from the rest of the ROM.

    Why are the debug vectors not accessible at the end of 512KB and why they are excluded from being loaded at boot time?

    just curious,

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • evanhevanh Posts: 4,428
    edited December 3 Vote Up0Vote Down
    msrobots wrote: »
    I still do not get the importance of treating the debug vectors different from the rest of the ROM.
    This is so the ROM version stays intact by default.
    Why are the debug vectors not accessible at the end of 512KB and why they are excluded from being loaded at boot time?
    Well, the jump table data will appear at both. Just the execution is always fixed at the high end.

    This is for compatibility as much as anything. It provides for the simplest hard coded absolute addressing to be used in software. It's not much good having different pieces of code assuming different fixed locations for the table just because they were developed for different editions of the Prop2.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • rjo__rjo__ Posts: 1,855
    edited December 4 Vote Up0Vote Down
    I vote for putting tables in the ROM ... my favorite would be to make pi log2 pi calculations easier ... for measuring information. This can always be put into a file and read into RAM... but I want that RAM for other things@!!!

    No doubt there are other tables that would be useful?
  • The ROM must be copied to HUB RAM to be usable. You are not getting extra code/table space.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • evanhevanh Posts: 4,428
    edited December 4 Vote Up0Vote Down
    Umm, there is no address space mapped to the Prop2 mask ROM. It is not execute in place (XIP). It is basically a tiny byte-wide ROM poked in a corner of the Prop2, that is copied into HubRAM at boot up time. Cog0, alone I think, has a special circuit and special microcode to access it. Execution only happens once it's in HubRAM. It's the one part of the Prop2 that is not symmetrical.

    EDIT: Dang, Cluso beat me to it.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • Rayman wrote: »
    I must have forgotten how to use LOC...

    Have some code that was working fine with this
    loc       ptra,#@OV965X_REGS_QVGA
    

    But then, I removed some debugging code and it stopped working...
    Replaced with this
    mov       ptra,##@OV965X_REGS_QVGA
    
    and it works again...

    The label, OV965X_REGS_QVGA, is around $400 in HUB
    @Rayman
    For Hub addresses below $400 use the absolute version of LOC.
    mov       ptra,#\OV965X_REGS_QVGA
    



    Melbourne, Australia
  • Thanks. I was hoping it was a bug and not a feature :)

    Is there a reason that loc can't work with #@ below $400?
    Prop Info and Apps: http://www.rayslogic.com/
  • @Cluso99

    I vote not to put tables in the ROM
  • Rayman wrote: »
    Is there a reason that loc can't work with #@ below $400?

    This might shed some light on it.
    Found this in the in instructions_v27.txt file.
    chip wrote:
    A symbol declared under ORGH will return its hub address when referenced.

    A symbol declared under ORG will return its cog address when referenced,
    but can return its hub address, instead, if preceded by '@':

    COGINIT #0,#@newcode

    For immediate-branch and LOC address operands, "#" is used before the
    address. In cases where there is an option between absolute and relative
    addressing, the assembler will choose absolute addressing when the branch
    crosses between cog and hub domains, or relative addressing when the
    branch stays in the same domain. Absolute addressing can be forced by
    following "#" with "\".

    Addresses below $400 would be assumed to be cog/lut addresses by Pnut me thinks.

    Melbourne, Australia
  • RaymanRayman Posts: 8,361
    edited December 5 Vote Up0Vote Down
    I think I'd prefer the opposite way, where @ always gives hub address and "\" can give cog address...

    I saw this part in that txt file:
    but can return its hub address, instead, if preceded by '@'
    

    But missed that second part...

    Anyway, shouldn't the boundary be $800? Or, did I do my math wrong...
    Prop Info and Apps: http://www.rayslogic.com/
  • Addressed below $400 (JMP/call and similar) will take these as lut and cog addresses. So there are restrictions for hub addresses below $400. They cannot be used for hubexec code, only for rd/wr instructions. ie data or cog/lut code that can be loaded into cog/lut for execution.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    They cannot be used for hubexec code, only for rd/wr instructions. ie data or cog/lut code that can be loaded into cog/lut for execution.
    I think Rayman's point is that LOC @ won't let you point to data for rd/wr or launch cog/lut code < $400

    Melbourne, Australia
  • jmg wrote: »
    IIRC the P2 PLL/VCO is now like most, with a SysCLK divider, and a VCO_FB_Divider, and Xtal_FB_Divider to the common PFD frequency.

    Command then looks something like
    ">Prop_PLL Sys_Div VCO_Div Xtal_Div" + some pause for PLL lock, and host Baud-redefine, and then '>' at the new higher Baud rate.

    Addit: Using this, a simple means to boost boot from a fast-UART part like EFM8UB3 becomes available :)
    With the available ~ 32kBytes of P2 code storage in the UB3, that's 5.4~4ms loading times, at 6~8MBaud that part should be capable of.
    (plus other hard-wired delays inside P2, hopefully, those are not too great...)

    I'll bump this with news the new EFM8UB3 USB-MCU is now showing stock and prices
    88.5c/1000 gives 8-bit MCU, Full Speed USB, 40kB Flash, 3kB RAM, 5 Volt, 12-bit ADC, UART, SMBus, SPI, 13 GPIOs

    The 40k Flash is quite an increase from the EFM8UB1, and would allow multiple bridge devices to be coded.
    eg a Mass Storage device could program EEPROM on a FLiP like P1 module, or program SPI Flash on a P2 design.

    Such a MCU can also manage the Prop1/2 reset, reducing the BOM, and offer more than one bridge link.
    It may even be able to power a P1, and the 48MHz SysClk could output 6MHz to P1, to further reduce the BOM.
  • Hi Chip, looking forward to trying out V28 when you have it available.

    I had a funny bug a while ago so let me relate how it affected my system and how I worked around it.

    Normally I load up most cogs with Tachyon but get them to run an IDLE after reset and when finally cog 0 does a coginit the reset routine checks the cogid and if it is zero then it will get it to run the terminal startup instead. The trouble was that my serial receive seemed to get corrupted when I was downloading a source file into Tachyon yet a dump of the large receive buffer showed nothing wrong. Through the time honored method of trial and error and the process of elimination I knew that the problem was in my startups and by disabling the IDLE coginits everything seemed to work. However it didn't matter which one I enabled back again, there was a bug there. Disable them, no bug.

    (It was as if another cog was identifying as cog 0 and running as the console, stealing a character from the receive stream now and then when the timing was right)

    Was it that the coginits needed a delay between them? That seemed to work but didn't seem right, so I applied the time honored methods again and again. Finally I decided to insert a NOP after the CLKSET #$FF and prior to the coginits. That fixed the problem and since then I have left it at that.

    However that got me to thinking that maybe this was one of the reasons why the DE2-115 had weird stepped levels on the output pins. Anyway, food for thought.
    org
    	        clkset  #$FF                    'switch to 80MHz (if pll, else 50MHz)
    reboot
                    nop			' seems to need delay after clkset (otherwise next coginit ids incorrectly)
                    coginit #7,#@RESET
                    coginit #6,#@RESET
                    coginit #5,#@RESET
                    coginit #4,#@RESET
                    coginit #3,#@RESET
                    coginit #2,#@RESET	
                    coginit #1,#@rxcog
                    coginit #0,#@RESET  ' RESET does a COGID so that #0 can run the console instead of an IDLE loop
    
    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    Tachyon Forth News Blog
    TACHYON DEMONSTRATOR
    Brisbane, Australia
  • Well done Peter. That looks a horrible bug to have almost got past.

    All of my test code has a WAITX (for the purpose of giving the PC debug terminal time to take over the comport after download) immediately following the CLKSET and is being executed correctly, afaik.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • evanhevanh Posts: 4,428
    edited December 8 Vote Up0Vote Down
    Hmm, I don't know how to use COGINIT properly. Only way I can make it work at all is not use any Cog declared symbols in the "RESET" code.

    So far, I have not been able to duplicate Peter's symptom above.

    Past my bedtime ...

    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • Peter,

    I can't make sense of what the trouble could be.

    Would it be possible for you to distill the erring code to the bare essentials, so that we could determine what the trouble is? Just changing speed should have no effect on the logic. It does the same thing at any speed.
  • cgraceycgracey Posts: 8,343
    edited December 9 Vote Up0Vote Down
    This seems to work okay:
    dat	org
    
    	clkset	#$FF
    	coginit	#7,#@go
    	coginit	#6,#@go
    	coginit	#5,#@go
    	coginit	#4,#@go
    	coginit	#3,#@go
    	coginit	#2,#@go
    	coginit	#1,#@go
    	coginit	#0,#@go
    
    	org
    
    go	cogid	x
    	add	x,#32
    lp	drvnot	x
    	waitx	##10_000_000
    	jmp	#lp
    
    x	res	1
    

    All 8 LEDs blink, anyway.
  • Peter
    Does adding a 'WC' to COGINIT make a difference?
    Melbourne, Australia
  • Peter,
    Are you still using v26, and might that be different to v27a/z/zz ?

    I have my SD card booter ready for v28. Just need to know where the SD pins will be.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    Peter,
    Are you still using v26, and might that be different to v27a/z/zz ?

    I have my SD card booter ready for v28. Just need to know where the SD pins will be.

    551 x 435 - 9K
  • V27z has been crashing after some time so I've been using V26. However in light of the fact that I have uncovered this startup bug I could try it out on V27z again. Now this bug is subtle, and certainly in the case of TAQOZ on V26 it was not always readily apparent but once I went to download a file I would get all kinds of download errors which had nothing to do with the received data. The RESET routine that is used with coginit #7,#@RESET etc immediately calls hub exec code which does a cogid as part of the init to check for cog 0 and have it run the console else run an idle loop. So it's not just a simple coginit.

    Once V28 is available I will try out TAQOZ on there and look for subtle problems even by removing the nop for instance. Lets see how it goes and I will try V28 on the CVA9 and DE2.
    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    Tachyon Forth News Blog
    TACHYON DEMONSTRATOR
    Brisbane, Australia
  • This matter of v27 being flakey is really concerning me. I think this is what's been eating me up. We need to discover whatever is wrong with it. ASAP.
  • I'm a bit into compiling v28 now.
Sign In or Register to comment.