Any shorter way to reverse bytes in long using only 1 extra long (in pasm)?
Cluso99
Posts: 18,069
Here is how I am reversing the byte order in a long.
I only use 1 additional cog long.
Can anyone find a shorter way?
I only use 1 additional cog long.
Can anyone find a shorter way?
' x: b0 b1 b2 b3 movs lmm_w, lmm_x ' x x x b3 rol lmm_x, #8 ' x: b1 b2 b3 b0 ror lmm_w, #8 ' b3 x x x movs lmm_w, lmm_x ' b3 x x b0 rol lmm_x, #8 ' x: b2 b3 b0 b1 ror lmm_w, #8 ' b0 b3 x x movs lmm_w, lmm_x ' b0 b3 x b1 rol lmm_x, #8 ' x: b3 b0 b1 b2 and lmm_x, #$FF ' x: 0 0 0 b2 ror lmm_w, #8 ' b1 b0 b3 x andn lmm_w, #$FF ' b1 b0 b3 0 or lmm_x, lmm_w ' x: b1 b0 b3 b2 ror lmm_x, #16 ' x: b3 b2 b1 b0
Comments
http://forums.parallax.com/showthread.php/89910-Endian-Puzzle
Perhaps I should explain my restrictions some more. It's for my debugger which runs in LMM mode. It can only use normal P1 instructions that are also on the P2. I am only allowing 2 actual cog locations of which one has both the original and final value. If 3 cog locations are required, then I may incur a 4 instruction penalty because I have to push and pop on my hubstack. This also means that a constant greater than 9 bits (an immediate source value) canno be uusd without a large penalty.
Well if you want to avoid constants the 00FF00FF mask can be built into the temporary register thus:
I think the following will work (after looking on the other thread) in 8 instructions plus 2 cog locations x & y.
2nd copy? My code works on Prop 1...
Here is your code with bytes shown...
I have written a small spin program,that reverses the order of the bytes, and you don't need any shifts or masks and you should be able to code it in 6 PASM instructions, and it only uses 1 Long..
Vic20Man
CON
_clkmode = xtal1 + pll8x 'Standard clock mode * crystal frequency = 80 MHz
_xinfreq = 10_000_000
VAR
long lng
long lng1
OBJ
pst : "Parallax Serial Terminal"
PUB main
pst.start(250_000)
pst.clear
' pst.str(string("test"))
lng:=000011_00000010_00000001_00000000
pst.dec(lng.byte[0])
pst.Tab
pst.dec(lng.byte[1])
pst.tab
pst.dec(lng.byte[2])
pst.tab
pst.dec(lng.byte[3])
'********* code below does the byte swapping *************
lng1.byte[0]:=lng.byte[1]
lng1.byte[1]:=lng.byte[0]
lng1.byte[2]:=lng.byte[3]
lng1.byte[3]:=lng.byte[2]
lng.word[0]:=lng1.word[1]
lng.word[1]:=lng1.word[0]
'********************* end of code ******************************
pst.newline
pst.dec(lng.byte[0])
pst.Tab
pst.dec(lng.byte[1])
pst.tab
pst.dec(lng.byte[2])
pst.tab
pst.dec(lng.byte[3])
Alas consecutive hub ops take 16 cycles each, rather than 4 for normal cog ops, so your 6 PASM instructions will take as long as 21 non-hub ops (on Prop 1 that is) - and you'd need to store a hub address constant in a register too.
Thank you for pointing that out about the extra clocks taken to write to the hub.
vic20man
It is actually much worse than Mark_T stated. You are running spin which is an interpreted language. Each bytecode (you may have a number of them for each instruction) takes more than 25 pasm instructions, and more likely ~50 or more. So we say spin is typically 50-100 times slower than pasm.
Another point is that pasm and spin cannot really be mixed in the same cog. In spin, the cog is running pasm code that is the interpreter, and it fills the cog. It is not impossible because a few of us have done this.
When we run LMM, the code is actually in hub, and copied instruction by instruction to cog for execution. Therefore in reality a rdlong takes no more than a mov in LMM mode because each LMM instruction takes at least 4 (in my case 5) instructions in a loop. That is why LMM mode runs at least 4-8 times slower than real pasm (i.e. pasm running within the cog).
Thank you for pointing out a few things about Spin and LMM code, below is a PASM program that should work.
vic20man
CON
_clkmode = xtal1 + pll8x 'Standard clock mode * crystal frequency = 80 MHz
_xinfreq = 10_000_000
VAR
long cogReady
OBJ
pst : "Parallax Serial Terminal"
PUB main
pst.start(250_000)
pst.clear
cognew(@reverseBytesLng,@cogReady)
repeat while cogReady== 0
pst.hex(cogReady,8)
dat
org $0000
reverseBytesLng cogid idCog
mov addrssCogReady,par
'*****************code performs the long byte(s) reversal******************
rol lngValue,#16
mov lngValueCpy,lngValue
shl lngValue,#8
and lngValue,bitMsk1
shr lngValueCpy,#8
and lngValueCpy,bitMsk2
or lngValue,lngValueCpy
'****************************end of code********************************
wrlong lngValue,addrssCogReady
cogstop idCog
lngValue long $01_02_03_04
bitMsk1 long %11111111_00000000_11111111_00000000
bitMsk2 long %00000000_11111111_00000000_11111111
idCog res 1
addrssCogReady res 1
parCpy res 1
lngValueCpy res 1
EDIT: Reworked as mentioned in post #17
rdbyte $180,#0
However, if you rework the code to use
y long $FF00FF00
this instruction becomes a nop because cccc=0000 which never executes (both in P1 & P2 - its a waitvid in P1 and a waitxxx in P2)
This is what I ultimately did with my LMM posted in #8
I presume you mean by "called the pasm routine" a cognew/coginit to reload the cog. Otherwise, as you said, y is corrupted from the previous use.
Post #16 reworked