Shop OBEX P1 Docs P2 Docs Learn Events
Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S - Page 34 — Parallax Forums

Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

13234363738

Comments

  • Heater.Heater. Posts: 21,230
    edited 2011-05-20 11:24
    lonesock,

    You are really pushing the boundaries here. I thought LLVM for ZPU was in the very early stages of development.
  • lonesocklonesock Posts: 917
    edited 2011-05-20 13:12
    I'm sure you're right, but I at least want the opportunity to _try_ it! Maybe even collaborate.

    Jonathan
  • Heater.Heater. Posts: 21,230
    edited 2011-05-20 19:41
    Great stuff. From the comments I've seen on the ZPU mailing list
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-23 05:03
    Heater, what are these XOR's for?
    xor     address, #$10           'Trust me, you need this.
    

    Not that I am not trusting you :) But I want to understand how this works.

    This is also interesting. What do 2 lower bits have to do with endianess?
    xor     addr, #%11              'XOR here is an endianess fix.
    
  • Heater.Heater. Posts: 21,230
    edited 2011-05-23 06:05
    Andrey,

    Very good questions and well spotted.

    In the LOADSP and STORESP instructions an offset is extracted from the opcode and added to the stack pointer to get the required memory address.
    Turns out that the top bit of that offset is inverted. This is an undocumented feature, at least it's not mentioned in the ZPU architecture web page. Initially I thought it was something to do with using signed offsets, that bit would be the sign bit, but that is not so. I did ask Zulin about this and
  • jazzedjazzed Posts: 11,803
    edited 2011-05-23 06:49
    If zpu can speak little-endian, it would solve many, many issues that slow down zog.

    David Betz did a wonderful job creating a better infrastructure for external memory users like C3 and SDRAM.
    His loader performance is top notch and will probably be leveraged in the new GCC tool-chain.
  • Heater.Heater. Posts: 21,230
    edited 2011-05-23 10:06
    Jazed,
    If zpu can speak little-endian, it would solve many, many issues that slow down zog.

    You might have to elaborate. Of all the PASM instructions the interpreter has to go through to execute a ZPU instruction and access code/data in HUB and ext memory the overhead of the endian fixes has never looked very big to me.

    When working with LONGS everything proceeds at full speed with no byte juggling to get the endianness right. That's why I reverse the byte order of every long in the binary prior to loading it to the Prop. For example I use Lonesock's F32 float object with no endianness fiddling.

    When working with bytes and words there is only a couple of extra instructions involved to get the endianness right.

    So where is endianness causing these many issues? Am I missing a point somewhere?
  • jazzedjazzed Posts: 11,803
    edited 2011-05-23 10:26
    Heater. wrote: »
    So where is endianness causing these many issues? Am I missing a point somewhere?
    Apparently the endian-ness problem makes it impossible to use HUB RAM for stack and local data while using external memory for code/globals. David may be able to share more info on this. Having stack/locals in HUB RAM would speed up zog by about a factor of 4 and make it somewhat competitive with Catalina.
  • Heater.Heater. Posts: 21,230
    edited 2011-05-23 10:36
    Ah, good point.

    Having stack/data in HUB and only code/constants in EXT RAM was always one of my primary motivations for wanting to build ZOG. The idea being that bytecodes fetched from ext RAM/ROM would use less memory bandwidth than having to use 32 bit instructions as you do with XMM.

    Now, quite why that does not work out is something I really have to find some time to investigate.
  • jazzedjazzed Posts: 11,803
    edited 2011-05-23 10:58
    Heater. wrote: »
    ... idea being that bytecodes fetched from ext RAM/ROM would use less memory bandwidth than having to use 32 bit instructions as you do with XMM.
    That reminds me. I should send you a SpinSocket-Flash kit. SpinSocket-Flash is a module that has a Propeller and 4MB byte-wide flash on a DIP32 footprint. Along with the SpinSocket Platform board, it makes a development environment that fits your idea very nicely.

    Two power options are available. Would you like one with low power battery support or higher power 3.3V regulation? The SpinSocket Platform board will support either type and has a LiPo charger.

    I'm running David's XBASIC on SpinSocket-Flash now. I'm also running it on GameBaby. The program is stored in flash and David's XBASIC PASM VM does all the work. Zog and Catalina are both hard pressed to interpret XBASIC byte codes fast enough, so he wrote a separate VM. It's a nice language as BASICs go ... like VB6 in some ways.

    End of shameless plug and offer of a free sample .... Is the post still on strike over there?
  • Heater.Heater. Posts: 21,230
    edited 2011-05-23 11:26
    OK Jazzed, I'm up for it. Battery power sounds good. You still have my address? Hopefully I get a bit of a long holiday during June/July when most of Finland shuts down for the summer. The post guys are working well but now Iceland is throwing a lot of s*** into the flight path again:)

    Odd thing is that Zog runs fine with code, data and stack in HUB so I'm not sure where it goes wrong moving code/const to ext space. I might go back to my basic Zog version (2.6 or so) and see what I can see with David's memory map and modified linker script ideas. The linker scripts was always something that thwarted me.
  • David BetzDavid Betz Posts: 14,516
    edited 2011-05-23 11:35
    Heater. wrote: »
    OK Jazzed, I'm up for it. Battery power sounds good. You still have my address? Hopefully I get a bit of a long holiday during June/July when most of Finland shuts down for the summer. The post guys are working well but now Iceland is throwing a lot of s*** into the flight path again:)

    Odd thing is that Zog runs fine with code, data and stack in HUB so I'm not sure where it goes wrong moving code/const to ext space. I might go back to my basic Zog version (2.6 or so) and see what I can see with David's memory map and modified linker script ideas. The linker scripts was always something that thwarted me.
    Actually, ZOG runs okay with the stack in hub memory. The problem is that loading COGs with coginit doesn't work correctly anymore. I think it's because only long accesses to hub memory work correctly. Byte accesses are in the wrong order.
  • Heater.Heater. Posts: 21,230
    edited 2011-05-23 11:43
    David,
    Great you may have saved me from a wild goose chase.
    COGINIT from C used to work for me. That's how I start and use FullDuplexSerial and F32 for example. Time to start digging again.
    Can you tell me the current memory map you are using, HUB, COG, EXT RAM addresses as seen by C? So that I can apply that to my basic ZOG set up?
  • jazzedjazzed Posts: 11,803
    edited 2011-05-23 13:48
    Heater. wrote: »
    OK Jazzed, I'm up for it. Battery power sounds good. You still have my address? Hopefully I get a bit of a long holiday during June/July when most of Finland shuts down for the summer.
    I have your address assuming you haven't moved .... I believe you have my email address in case.

    June/July? Short time-frame. I guess that's a great time to have vacation so far north. I'll try to get something off to you within a week or so. I have another care package to send out so one stop at the post will be nice.

    I fully intend to get an SDRAM and a SpinSocket-Flash driver working on Catalina by summer's end.

    Want to race? :) just kidding. i'm very busy with GameBaby right now.
  • Heater.Heater. Posts: 21,230
    edited 2011-05-23 14:11
    No, I have not moved.

    Short time frame, yes, after that the Prop II comes out don't forget. Then it's a long dark winter here porting everything to that:)
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-23 20:49
    I still cannot get the endianess issue. I am compiling this program:
    volatile unsigned long xxxx = 0x12345678;
    
    void _premain(void)
    {
    	main();
    }
    
    int fibo(unsigned int n)
    {
    	if (n < 2) return n;
    	return fibo(n-2) + fibo(n-1);
    }
    
    int main(void)
    {
    	(void)xxxx; // do not optimize it out
    	fibo(12);
    	return 0;
    }
    
    using a modified crt0.S
    zpu-elf-gcc   -Os -phi  -Wl,--relax -Wl,--gc-sections   -nostdlib test.c crt0_phi.S
    
    LC_ALL=C zpu-elf-objdump -D -z a.out
    
    
    a.out:     file format elf32-zpu
    
    Disassembly of section .fixed_vectors:
    
    00000000 <_memreg>:
       0:	0b          	nop
       1:	0b          	nop
       2:	0b          	nop
       3:	0b          	nop
       4:	82          	im 2
       5:	70          	loadsp 0
       6:	0b          	nop
       7:	0b          	nop
       8:	0b          	nop
       9:	80          	im 0
       a:	e4          	im -28
       b:	0c          	store
       c:	3a          	config
       d:	0b          	nop
       e:	0b          	nop
       f:	0b          	nop
      10:	80          	im 0
      11:	d7          	im -41
      12:	04          	poppc
      13:	00          	breakpoint
      14:	00          	breakpoint
      15:	00          	breakpoint
      16:	00          	breakpoint
      17:	00          	breakpoint
      18:	00          	breakpoint
      19:	00          	breakpoint
      1a:	00          	breakpoint
      1b:	00          	breakpoint
      1c:	00          	breakpoint
      1d:	00          	breakpoint
      1e:	00          	breakpoint
      1f:	00          	breakpoint
    Disassembly of section .text:
    
    00000020 <fibo>:
      20:	fe          	im -2
      21:	3d          	pushspadd
      22:	0d          	popsp
      23:	74          	loadsp 16
      24:	70          	loadsp 0
      25:	53          	storesp 12
      26:	53          	storesp 12
      27:	81          	im 1
      28:	73          	loadsp 12
      29:	27          	ulessthanorequal
      2a:	92          	im 18
      2b:	38          	neqbranch
      2c:	fe          	im -2
      2d:	13          	addsp 12
      2e:	51          	storesp 4
      2f:	f0          	im -16
      30:	3f          	callpcrel
      31:	80          	im 0
      32:	08          	load
      33:	ff          	im -1
      34:	14          	addsp 16
      35:	52          	storesp 8
      36:	52          	storesp 8
      37:	e8          	im -24
      38:	3f          	callpcrel
      39:	80          	im 0
      3a:	08          	load
      3b:	12          	addsp 8
      3c:	52          	storesp 8
    
    0000003d <.L1>:
      3d:	71          	loadsp 4
      3e:	80          	im 0
      3f:	0c          	store
      40:	84          	im 4
      41:	3d          	pushspadd
      42:	0d          	popsp
      43:	04          	poppc
    
    00000044 <main>:
      44:	ff          	im -1
      45:	3d          	pushspadd
      46:	0d          	popsp
      47:	80          	im 0
      48:	dc          	im -36
      49:	08          	load
      4a:	52          	storesp 8
      4b:	8c          	im 12
      4c:	51          	storesp 4
      4d:	d2          	im -46
      4e:	3f          	callpcrel
      4f:	80          	im 0
      50:	0b          	nop
      51:	80          	im 0
      52:	0c          	store
      53:	83          	im 3
      54:	3d          	pushspadd
      55:	0d          	popsp
      56:	04          	poppc
    
    00000057 <_premain>:
      57:	ec          	im -20
      58:	3f          	callpcrel
      59:	04          	poppc
    Disassembly of section .data:
    
    0000005c <__data_start>:
      5c:	12          	addsp 8
      5d:	34          	storeb
      5e:	56          	storesp 24
      5f:	78          	loadsp 32
    
    00000060 <_hardware>:
      60:	00          	breakpoint
      61:	00          	breakpoint
      62:	00          	breakpoint
      63:	00          	breakpoint
    
    00000064 <_cpu_config>:
      64:	00          	breakpoint
      65:	00          	breakpoint
      66:	00          	breakpoint
      67:	00          	breakpoint
    
    zpu-elf-objcopy -j .text -j .data -j .fixed_vectors -O binary a.out a.bin ; hexdump -C a.bin
    00000000  0b 0b 0b 0b 82 70 0b 0b  0b 80 e4 0c 3a 0b 0b 0b  |.....p......:...|
    00000010  80 d7 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000020  fe 3d 0d 74 70 53 53 81  73 27 92 38 fe 13 51 f0  |.=.tpSS.s'.8..Q.|
    00000030  3f 80 08 ff 14 52 52 e8  3f 80 08 12 52 71 80 0c  |?....RR.?...Rq..|
    00000040  84 3d 0d 04 ff 3d 0d 80  dc 08 52 8c 51 d2 3f 80  |.=...=....R.Q.?.|
    00000050  0b 80 0c 83 3d 0d 04 ec  3f 04 00 00 12 34 56 78  |....=...?....4Vx|
    00000060  00 00 00 00 00 00 00 00                           |........|
    00000068
    

    I see the bytes in binary file in same order as they are in disassembler dump. Analyzing neqbranch and callpcrel instructions also does not show anything unusual. - but the long in data section (0x12345678) IS big endian.

    Can't get how this mathes what the docs say:
    The instructions are stored big endian. That is the first instruction is stored in the most significant byte, and the forth is in the least significant byte.
  • jazzedjazzed Posts: 11,803
    edited 2011-05-23 22:09
    I still cannot get the endianess issue
    The Zog VM has many XORs to handle the endianness problems. If it was just built to be little endian, it would run faster on Propeller just because the XORs can be removed. Unfortunately zpu tools won't make a little endian image. I'd like to see a GCC port that emits PASM with macros to handle jumps and data manipulation, but that may only happen with Propeller 2 stuff ... hard to tell just now, so if you want Propeller 1 GCC, Zog may be the only choice and it would be best to optimize it while there is time.
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-24 01:05
    After almost whole day of searching, editing and compiling (mostly searching), I think I have made a version of GCC and binutils that produce something looking like a binary for little-endian ZPU!

    I am going to do some tests and grab some beer (is that the correct order? maybe beer first? :) ), and will post more info here
  • Heater.Heater. Posts: 21,230
    edited 2011-05-24 01:16
    Andrey,

    To be honest, this endianness issue gives me headache. If I think about it long enough the thing starts flipping from one end to the other and back again. Especially when interfacing ZPU bigendian code with Prop littleendian code. Or is that the otherway around:) It's like looking at one of those optical illusions where as you stare at it the image flips from being one thing to being some other thing and I end up being hopelessly confused.

    Now, what you have compiled and disassembled there looks perfectly normal and familiar. So lets concentrate on the first endian issue:

    In that code you have defined an initialised integer xxxx = 0x12345678. As you see in the disassembly it is stored in the image, and hence memory, as the sequence of bytes 12, 34, 56, 78 going up memory. That is most significant byte first or bigendian.

    Conversely the Prop is littleendian as can be seen by making a similar initialized long in PASM:
    xxxx    long $12345678
    
    Results in this in the BST listing output:
    0018(0000) 78 56 34 12 | xxxx    long $12345678
    

    So we have a problem. Lets say we had in C:
    volatile unsigned long xxxx = 0xFF000000;
    volatile unsigned long yyyy = 0x01000000;
    result = xxxx + yyyy;
    

    We want the result to be 0x000000 but the Prop, operating on these as RDLONG, RDLONG, ADD, WRLONG is going to produce the result of $00010000. (Is that right?)

    How to fix this?
    a) Arrange that whenever the emulator reads a LONG all the bytes within the long are reversed in order prior to use. When writing results all the bytes are reversed again.

    This is clearly going to slow the emulation down a lot. The ZPU spends most of it's time dealing with LONGS on the stack. Not Good.

    b) Let's reverse the byte order of every four bytes of the binary executable. Either as we load it to memory or as a last step in the build process.

    That's nice, now all our initialised data and constants are the right way round. ZOG can operate on LONGS all day, at speed, without error. C under ZOG and other SPIN/PASM code can exchange LONGS without any worries about byte order.

    Ah but. Now we have all our bytecodes in the wrong order!

    No worries, for the cost of a single XOR when reading a byte code we can be sure to always pick up the right one. (xor memp, #%11). Why? Inverting bit 1 steers the access on WORD up or down from the given code address, Inverting bit 0 steers the access up or down one byte from the given code address resulting in the correct actual address in Propeller space.

    Similarly reads and writes to BYTES are fixed with xor %11.

    Similarly accesses to WORDS are fixed with xor $10 which steers to the correct WORD in Propeller memory space.

    There we are, job done!

    Except...This does cause some issues with interfacing to PASM and SPIN code when you want to exchange BYTES and WORDS. For example for Spin to write a string to memory so that ZOG gets it right the Spin code has to reverse every four bytes. This can be done with the XOR trick as well.

    N.B. This trick only works correctly because ZPU code should never access WORDS on a odd address or LONGS that are not 4 byte aligned.

    After all that I'm not sure about the original question.
    I see the bytes in binary file in same order as they are in disassembler dump.

    Sure you do, they are both just displaying bytes going up memory.
    Can't get how this mathes what the docs say:
    The instructions are stored big endian. That is the first instruction is stored in the most significant byte, and the forth is in the least significant byte.

    And so they do. If you were to write a C program to read an integer from address $4 you would get the number 0x82700b0b. I.e. bigendian.

    If anytone can see a better way to sort this mess I'd like to here it. Developing a little endian zpu-gcc is not on the cards for me.
  • Heater.Heater. Posts: 21,230
    edited 2011-05-24 01:17
    Good grief Andrey,

    In the time it takes me to write all that you have fixed the issue at source :)
  • Heater.Heater. Posts: 21,230
    edited 2011-05-24 01:30
    I'd go for the beer first, take a break and then look at what there is:)

    In the first post of this thread there is a ZPU VM in C that runs under Linux. It might help with testing. Of course it's BYTE and WORD access routines are now backwards for you?

    If you could put your compiler up somewhere I could try and find some time to do some testing as well.
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-24 01:32
    Ahh, the reversing of bytes order in code section was the missing part. Now I think I understand that completely.

    Heater, is there an easy way to run Zog on bare Propeller, without any external memory?
  • Heater.Heater. Posts: 21,230
    edited 2011-05-24 02:13
    Yep, on page 26 of this thread there is zog_v1_6 which is the last version I put out I think. Gosh that was a long time ago.
    Anyway with #define USE_HUB_MEMORY uncommented and #define USE_JCACHED_MEMORY and #define USE_VIRTUAL_MEMORY commented out it should run fibo.bin that is a ZPU binary executable included via a Spin "file" statement.

    You can try using #deine SINGLE_STEP that will execute one byte code every time you press a key.
  • David BetzDavid Betz Posts: 14,516
    edited 2011-05-24 03:58
    After almost whole day of searching, editing and compiling (mostly searching), I think I have made a version of GCC and binutils that produce something looking like a binary for little-endian ZPU!

    I am going to do some tests and grab some beer (is that the correct order? maybe beer first? :) ), and will post more info here

    Wow! That's great news! I'd love to have a copy of the changes you made to get this to work.
  • Heater.Heater. Posts: 21,230
    edited 2011-05-24 04:16
    Yep, totally fantastic.

    What with:
    a) Andrey's little endian zpu-gcc
    b) You linker scripts that can get code/constants out into ext memory whilst stack/data are in HUB.

    ZOG is finally going to show XMM solutions what it can do when you have code in ext serial FLASH/RAM:)

    You guys are great.
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-24 05:52
    Attached is the diff to letest version pulled from http://repo.or.cz/w/zpu.git/

    No guarantee that it works as expected - I have not tested such things as relocations. Simple tests seem OK
    andrey@debian:/tmp/z$ cat test.c
    volatile long long zzzz = 0x800098765432ULL;
    
    void _premain(void)
    {
    	main();
    }
    
    
    int main(void)
    {
    	zzzz = 3;
    	return 0;
    }
    andrey@debian:/tmp/z$ zpu-gcc -g2  -Os -phi  -Wl,--relax -Wl,--gc-sections  -nostdlib test.c crt0_phi.S
    andrey@debian:/tmp/z$ LC_ALL=C zpu-readelf -h a.out
    ELF Header:
      Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
      Class:                             ELF32
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              EXEC (Executable file)
      Machine:                           Zylin ZPU
      Version:                           0x1
      Entry point address:               0x0
      Start of program headers:          52 (bytes into file)
      Start of section headers:          944 (bytes into file)
      Flags:                             0x0
      Size of this header:               52 (bytes)
      Size of program headers:           32 (bytes)
      Number of program headers:         2
      Size of section headers:           40 (bytes)
      Number of section headers:         17
      Section header string table index: 14
    andrey@debian:/tmp/z$ LC_ALL=C zpu-objdump -d -z -j .fixed_vectors -j .text -j .data -j .bss
    
    a.out:     file format elf32-zpu
    
    Disassembly of section .fixed_vectors:
    
    00000000 <_memreg>:
       0:	0b          	nop
       1:	0b          	nop
       2:	0b          	nop
       3:	0b          	nop
       4:	82          	im 2
       5:	70          	loadsp 0
       6:	0b          	nop
       7:	0b          	nop
       8:	0b          	nop
       9:	80          	im 0
       a:	c4          	im -60
       b:	0c          	store
       c:	3a          	config
       d:	0b          	nop
       e:	0b          	nop
       f:	0b          	nop
      10:	0b          	nop
      11:	b3          	im 51
      12:	04          	poppc
      13:	00          	breakpoint
      14:	00          	breakpoint
      15:	00          	breakpoint
      16:	00          	breakpoint
      17:	00          	breakpoint
      18:	00          	breakpoint
      19:	00          	breakpoint
      1a:	00          	breakpoint
      1b:	00          	breakpoint
      1c:	00          	breakpoint
      1d:	00          	breakpoint
      1e:	00          	breakpoint
      1f:	00          	breakpoint
    Disassembly of section .text:
    
    00000020 <main>:
      20:	ff          	im -1
      21:	3d          	pushspadd
      22:	0d          	popsp
    
    00000023 <.LM2>:
      23:	83          	im 3
      24:	51          	storesp 4
      25:	80          	im 0
      26:	71          	loadsp 4
      27:	b8          	im 56
      28:	0c          	store
      29:	bc          	im 60
      2a:	0c          	store
    
    0000002b <.LM3>:
      2b:	80          	im 0
      2c:	0b          	nop
      2d:	80          	im 0
      2e:	0c          	store
    
    0000002f <.LM4>:
      2f:	83          	im 3
      30:	3d          	pushspadd
      31:	0d          	popsp
      32:	04          	poppc
    
    00000033 <_premain>:
      33:	ec          	im -20
      34:	3f          	callpcrel
      35:	04          	poppc
    Disassembly of section .data:
    
    00000038 <__data_start>:
      38:	32          	xor
      39:	54          	storesp 16
      3a:	76          	loadsp 24
      3b:	98          	im 24
      3c:	00          	breakpoint
      3d:	80          	im 0
      3e:	00          	breakpoint
      3f:	00          	breakpoint
    
    00000040 <_hardware>:
      40:	00          	breakpoint
      41:	00          	breakpoint
      42:	00          	breakpoint
      43:	00          	breakpoint
    
    00000044 <_cpu_config>:
      44:	00          	breakpoint
      45:	00          	breakpoint
      46:	00          	breakpoint
      47:	00          	breakpoint
    andrey@debian:/tmp/z$
    
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-24 06:01
    This is the startup file I am using
  • David BetzDavid Betz Posts: 14,516
    edited 2011-05-24 06:02
    Thanks! Have you rebuilt newlib in little-endian mode yet?
  • Andrey DemenevAndrey Demenev Posts: 377
    edited 2011-05-24 06:05
    Nope. Just making sure that compiler and binutils are working, first
  • jazzedjazzed Posts: 11,803
    edited 2011-05-24 06:36
    Looks like a great day for ZOG. Thanks Andrey.
Sign In or Register to comment.