Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

evanh · 2015-09-29 05:43

I'm not making any sense of that JMG but it certainly sounds very hacky.

Tor · 2015-09-29 07:54

evanh wrote: »

Tor wrote: »

The 6502 implemented LE instead of BE as was used by its predecessor, the 6800. The main reason was that it saves cycles when doing operations which requires addition of 16-bit numbers (including address indexing) on an 8-bit bus. With LE you can fetch the lowest significant byte first, then start working with that one immediately while fetching the next byte.

The 16-bit (over 8-bit bus) addressing example still requires both bytes before an address can be actioned. I suspect a bigger adder circuit would resolve whatever is of concern here. Could do with some more detail.

A bigger adder doesn't help if you can fetch only 8 bits at the time. If you look at the addition example you add two 16-bit values (assume one of them is in a 16-bit register already) by first adding the least significant bytes, then the most significant bytes plus any carry from the previous operation. The point with LE is that the 6502 (and the 8088) could do that first addition in parallel with fetching the most significant byte. The 6800 would fetch the most significant byte, then the least significant byte, *then* start the addition operation the normal way (add LE bytes, add BE bytes plus carry).

All human readable languages and even all computing formats are written/displayed in BE format.

I take it you prefer to work with BE.. and so do I. I just wish USA would do the same with their date format, instead of the mixed-endianity in use now

LE has no real technical advantages over BE.

Unfortunately this is not always the case, at least for architectures where the number of bytes that can be fetched per cycle is less than the word sizes used. In the old days of the 6502 and the Intel and Zilog 8-bitters the advantage to LE, from a technical point of view, was not insignificant.

Edit: I had a look and It's mentioned on wikipedia:
https://en.wikipedia.org/wiki/Endianness#Calculation_order
although they don't get into the specific optimization the 6502 used, its ability to do the first addition in parallel with fetching the next byte.
Elsewhere I saw it stated like this for the indexing case:

The 6502 can fetch the next byte of a complex opcode while it is currently executing. Take for example an indexed load, something like LDA $1000,x. Instruction and parameter take three bytes. This is a two-step instruction: add the index to the address, then load. The 6502 interleaves loading the second byte of the address with the indexing addition, which speeds up the load by one cycle. This is possible because the byte order is little-endian.

-Tor

evanh · 2015-09-29 08:23

A bigger (16b+16b) adder would just do a single addition once both bytes are loaded. Obviously this takes more circuitry ... but also executes faster.

If you want to start picking apart old tech like that, then back then there wasn't anything like burst transactions either, DRAM random access was all equal down to the byte .. so fetching the second byte first is just as quick.

cgracey · 2015-09-29 09:46

I just posted an updated link to the latest file at the top of this thread.

Tor · 2015-09-29 09:53

Obviously we're talking historical reasons for LE here. If you can get to all the bytes simultaneously, which is generally the case these days, LE gives no performance enhancement over BE (there's only the dubious, but not completely bogus 'cast' argument left). But Intel can't change their processors, they've based everything on being backwards compatible all the way back - even to the 8080, with the help of an assembly translator. So they'll stay LE. ARM and MIPS can be configured either way. SPARC is BE due to Sun's previous processor, the MC68k. And so on and so forth.
For new processors without baggage, and if all bytes can be fetched in the same cycle, there's not much technical reason to choose LE. But by then you may have programmers who have read hex in LE for so long that they're comfortable with it. Or they want to exchange binary data with an Intel CPU (never mind padding and alignment issues).
But yes I prefer BE if I can choose.

-Tor

ozpropdev · 2015-09-29 10:30

cgracey wrote: »

I just posted an updated link to the latest file at the top of this thread.

Thanks Chip!

:cool:
New image and Pnut working nicely!

Edit: Tested on Prop123-A7 and DE2-115 Ok

Heater. · 2015-09-29 10:34

It may come as a surprise to you but the Propeller is both BE and LE!

When you define a long in a DAT section the bytes sit in memory one way around, when you use a literal constant in your Spin code the bytes sit in memory the other way around.

Normally we don't get to see this in any Spin/PASM programming we do as we don't look at those literal constants sitting in the Spin byte codes. But you can see it if you look at the listings produced by BST.

Object DAT Blocks
|===========================================================================|
0018(0000) 04 03 02 01 | aLong long $01020304
|===========================================================================|
|===========================================================================|
Spin Block start with 0 Parameters and 0 Extra Stack Longs. Method 1
PUB start

Local Parameter DBASE:0000 - Result
|===========================================================================|
6                         x := $01020304
Addr : 001C: 3B 01 02 03 04  : Constant 4 Bytes - 01 02 03 04 - $01020304 16909060
Addr : 0021:             41  : Variable Operation Global Offset - 0 Write
Addr : 0022:             32  : Return

Or in the binary:

$ hexdump -C Untitled1.binary 
00000000  00 1b b7 00 00 aa 10 00  24 00 30 00 1c 00 34 00  |........$.0...4.|
00000010  14 00 02 00 0c 00 00 00  04 03 02 01 3b 01 02 03  |............;...|
00000020  04 41 32 00                                       |.A2.|
00000024

Did I say "byte" codes? I suspect this is done for the same reason as described above for the 6502, it's quicker and easier for Spin's stack based byte code interpreter to work with numbers if they are the "right" way around.

evanh · 2015-09-29 11:40

Tor wrote: »

... But Intel can't change their processors, they've based everything on being backwards compatible all the way back ...

Hmm, we could probably not be talking about the PC but since you're insisting ... The PC industry, including Intel, have a long history of rearranging things. ISA got ditched for PCI for example. 32bit mode was a pretty big change that took a long time to drag everyone along ... 64bit mode is an even bigger shift in the programming model but that was hardly even a bump in the road, the reason? ... there is a large amount of user software insulation these days. You could say general computing demands it. The Mac switched from BE to LE on the drop of a hat and still supported legacy programs. Intel probably added new instructions for that deal.

Intel doesn't see any reason to switch is all. Who still debugs with simple debuggers on Windozes any longer?

mindrobots · 2015-09-29 11:45

Could we move this BE/LE discussion off to its own thread instead of having it buried in the middle of Chip's thread for releasing and discussing the P2 FPGA image releases?

I, the master of irrelevancy, don't see how this is at all related to the P2 or the FPGA unless we are going to flip-flop endianess at this point in the P2 design.

Thank you!!

evanh · 2015-09-29 11:51

It never needed to extend. The original statement was sufficient.

Seairth · 2015-09-29 12:08

Could the off-topic BE/LE discussion be moved to a different thread?

mindrobots · 2015-09-29 12:17

New image works here on my 1-2-3 FPGA.
I'll try it on my DE2 when I get home from work (shhh, don't tell anybody, there be P2s at work!!!)

David Betz · 2015-09-29 12:26

cgracey wrote: »

I just posted an updated link to the latest file at the top of this thread.

Cool! Now I have 11 LEDs flashing on my DE2-115 board. Need to get gas working with the new instruction set. Has the instructions.txt file been updated to match the new image?

Rayman · 2015-09-29 12:51

I also got it working...
The phasing of the LEDs is kinda interesting...
I suppose LED10 should blink almost twice as fast as LED0. I guess it looks right.

Rayman · 2015-09-29 12:53

Wish I could add attachment to previous post, but won't let me...
Wanted to show photo of 11 LEDs flashing...

Rayman · 2015-09-29 14:13

Here's some oscilloscope screenshots of the cog_1k_program.spin output with latest release DE2-115.
Zooming in on the burst...
Yellow is P4 and Green is P0. (I've got bandwidth limit on so it looks smoother).

Ale · 2015-09-29 16:10

We are getting a BemicroCv -A2 version too, aren't we ? or was it dropped from the target "audience" ?

Edit: I posted without reading the whole thread.

This is great news !

David Betz · 2015-09-29 20:52

What does the "ww" field do in the following instructions?

CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel

Seairth · 2015-09-29 21:01

David Betz wrote: »
What does the "ww" field do in the following instructions?
CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel

I believe it's an index for one of the 4 registers: PTRA, PTRB, ADRA, ADRB.

Seairth · 2015-09-30 00:07

Rayman wrote: »

I also got it working...
The phasing of the LEDs is kinda interesting...
I suppose LED10 should blink almost twice as fast as LED0. I guess it looks right.

Huh. I think higher-numbered cogs should be blinking slower than lower-numbered cogs, as the delay is ((cogid+16)<<18). Try taking it to a single cog (comment out all of the coginits) and see if LED 0 or LED 10 lights up.

cgracey · 2015-09-30 00:08

Seairth wrote: »
David Betz wrote: »
What does the "ww" field do in the following instructions?
CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel
I believe it's an index for one of the 4 registers: PTRA, PTRB, ADRA, ADRB.

That's correct.

David Betz · 2015-09-30 00:16

cgracey wrote: »
Seairth wrote: »
David Betz wrote: »
What does the "ww" field do in the following instructions?
CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel
I believe it's an index for one of the 4 registers: PTRA, PTRB, ADRA, ADRB.
That's correct.

Nice. Thanks! I didn't realize we had four choices for CALLD.

cgracey · 2015-09-30 00:44

David Betz wrote: »
cgracey wrote: »
Seairth wrote: »
David Betz wrote: »
What does the "ww" field do in the following instructions?
CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel
I believe it's an index for one of the 4 registers: PTRA, PTRB, ADRA, ADRB.
That's correct.
Nice. Thanks! I didn't realize we had four choices for CALLD.

That's for the constant-address version. There's also CALLD D,S which gives 512 possibilities for D.

David Betz · 2015-09-30 00:45

cgracey wrote: »
David Betz wrote: »
cgracey wrote: »
Seairth wrote: »
David Betz wrote: »
What does the "ww" field do in the following instructions?
CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel
I believe it's an index for one of the 4 registers: PTRA, PTRB, ADRA, ADRB.
That's correct.
Nice. Thanks! I didn't realize we had four choices for CALLD.
That's for the constant-address version. There's also CALLD D,S.

Yes, I noticed that as well. Good stuff.

cgracey · 2015-09-30 00:56

David Betz wrote: »
cgracey wrote: »
David Betz wrote: »
cgracey wrote: »
Seairth wrote: »
David Betz wrote: »
What does the "ww" field do in the following instructions?
CCCC 11100ww Rnn nnnnnnnnn nnnnnnnnn        CALLD   reg,#abs/@rel
CCCC 11101ww Rnn nnnnnnnnn nnnnnnnnn        LOC     reg,#abs/@rel
I believe it's an index for one of the 4 registers: PTRA, PTRB, ADRA, ADRB.
That's correct.
Nice. Thanks! I didn't realize we had four choices for CALLD.
That's for the constant-address version. There's also CALLD D,S.
Yes, I noticed that as well. Good stuff.

I need to make the assembler handle the D,S/@ version such that it allows nearby @ addresses with all 512 D's.

Seairth · 2015-09-30 01:49

Here's the all_cogs_blink demo (for 2015-09-29 image), rewritten to use REP instead of JMP. This uses the new @ syntax and #0 to indicate infinite repeats. Because REP does not work in hub exec mode (bummer), this requires the code to be copied to cog memory before executing. Note also that each cog now COGINIT's the next cog. This was just to shorten the code I was looking at.

msrobots · 2015-10-03 00:46

Seairth wrote: »

Here's the all_cogs_blink demo (for 2015-09-29 image), rewritten to use REP instead of JMP. This uses the new @ syntax and #0 to indicate infinite repeats. Because REP does not work in hub exec mode (bummer), this requires the code to be copied to cog memory before executing. Note also that each cog now COGINIT's the next cog. This was just to shorten the code I was looking at.

where exactly does the started COG start another one?

dat
        orgh    1

' each cog:
' 1. Copies code to cog memory and jumps to cog exec mode
' 2. launches the next cog
' 3. then blinks for eternity
' any cogs missing from the FPGA won't blink

init    coginit #16,#init
        mov ptrb, #cog_code
        setq #(x-cog_entry)>>2
        rdlong cog_entry, ptrb
        jmp #cog_entry

cog_code

        org 8 << 2

cog_entry

blink   rep @:end,#0
        cogid   x               'which cog am I?
        setb    dirb,x          'make that pin an output
        notb    outb,x          'flip its output state
        add     x,#16           'add to my id
        shl     x,#18           'shift it up to make it big
        waitx   x               'wait that many clocks
:end

x       res     1

Mike

Seairth · 2015-10-03 01:05

msrobots wrote: »

where exactly does the started COG start another one?

The very first line. You will notice that coginit starts the next available cog (#16) at instruction "#init". It then goes on to set up the cog to run in cog exec mode. In the meantime, the next cog is doing exactly the same thing (starting the next cog, setting up, etc...).

msrobots · 2015-10-03 01:10

Seairth wrote: »

msrobots wrote: »

where exactly does the started COG start another one?

The very first line. You will notice that coginit starts the next available cog (#16) at instruction "#init". It then goes on to set up the cog to run in cog exec mode. In the meantime, the next cog is doing exactly the same thing (starting the next cog, setting up, etc...).

Yes. Now I see it. I was looking at the code of the COG, thinking of how you start one COG out of PASM.

You comments are misleading.

' 1. Copies code to cog memory and jumps to cog exec mode
' 2. launches the next cog
' 3. then blinks for eternity

should be

' 1. launches the next cog
' 2. Copies code to cog memory and jumps to cog exec mode
' 3. then blinks for eternity

Thanks!

Mike

evanh · 2015-10-03 01:50

Intriguing, since I don't have a FPGA I haven't tried to read any of sources previously. Mike, you say launch then copy ... this means that the SETQ + RDLONG pair somehow does a block copy, correct? Is that how a burst copy is done? What defines the length?

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments