Preditor - Speed Help - PASM to the Rescue!!!
CassLan
Posts: 586
This is the routine that fills the DisplayBuffer with exactly what should be in the text editing window (ie whitespaces are " ").
It takes the Address of the start of a line, as well as the horizontal offset and fills buffer[noparse]/noparse.
It works well but is slow, granted this does the whole editing window, while it probably only needs to do one line at a time while editing, but I thought I would ask some people who are more experienced than myself (most of you ) if you see anything that would shave some clocks off this.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
Post Edited (CassLan) : 1/2/2010 11:23:13 PM GMT
It takes the Address of the start of a line, as well as the horizontal offset and fills buffer[noparse]/noparse.
It works well but is slow, granted this does the whole editing window, while it probably only needs to do one line at a time while editing, but I thought I would ask some people who are more experienced than myself (most of you ) if you see anything that would shave some clocks off this.
PUB DisplayBufferFill(addr,horizoffset) ' Fills the Display Buffer for the editing window as it will display on screen counter := 0 'index of Buffer array counter2 := 0 'index of DisplayBuffer counter4 := 0 'current cursor location on that line counter5 := 0 'horizoffset counter counter6 := 0 'horizontal EOL Flag ' ' Lets start by determining what we COULD be running into ' With Horizontal Offset of 0 ' 1) A single Character..followed by more characters ' 2) A single Character..followed by an EOL (Carriage Return) ' 3) An EOL..followed by more characters ' 4) An EOL..followed by another EOL ' ' With a Horizontal Offset of some value ' 5) Skipping over Characters..to Display a single character followed by more characters ' 6) Skipping over Characters and EOL .. to Display nothing on that line ' 7) Skipping over EOL..to display nothing on that line ' repeat until counter2 == DisplayBufferSize ' do this until we have filled out display buffer counter6 := 0 'reset horiz EOL Flag If HorizOffset <> 0 and counter4 == 0 'we have a horizoffset value to consider and we are at the beginning of the line 'We are skipping over characters..we need to check what they are counter5:=0 repeat until counter5 == horizoffset ' do this for every horizontal offset value (character we are skipping) case sdcard.vbpeek(0,addr+counter) 32..126: 'we are skipping standard characters counter++ 'increment buffer index counter5++ 'increment horizoffset counter 13: ' we have come across an EOL before we are displaying any chars on this line if sdcard.vbpeek(0,addr+counter+1) == 10 'quick check to see if we have a linefeed (most likely) counter++ ' if we do increment the Buffer index to skip over this we are now lined up on it repeat ((DisplayWidth-2)-counter4) ' Fill the DisplayBuffer with spaces until the end of the display on that line buffer[noparse][[/noparse]counter2]:=32 counter2++ ' increment the DisplayBuffer index as we insert spaces counter++ 'increment the Buffer index (passed the linefeed/EOL) counter4:= 0'reset the CharactersOnLine counter (should,have been 0 anyway) 'we should exit this loop at this point counter5 := horizoffset ' we will no longer display characters on this line counter6 := 1 'set this flag to skip the line char render since we are at a new line now other: ' for odd chars, we will treat just like standards for now counter++ 'increment buffer index counter5++ 'increment horizoffset counter If counter6 == 0 case sdcard.vbpeek(0,addr+counter) 32..126: 'standard character buffer[noparse][[/noparse]counter2] := sdcard.vbpeek(0,addr+counter) 'place this chacter value in the Display Buffer counter++ 'increment the Buffer index counter2++ 'increment the DisplayBuffer index counter4++ 'increment the CharacterOnLine counter 149: 'standard character buffer[noparse][[/noparse]counter2] := 15 'place this chacter value in the Display Buffer counter++ 'increment the Buffer index counter2++ 'increment the DisplayBuffer index counter4++ 'increment the CharacterOnLine counter 13: if sdcard.vbpeek(0,addr+counter+1) == 10 'quick check to see if we have a linefeed (most likely) counter++ ' if we do increment the Buffer index to skip over this we are now lined up on it repeat ((DisplayWidth-2)-counter4) ' Fill the DisplayBuffer with spaces until the end of the display on that line buffer[noparse][[/noparse]counter2] := 32 counter2++ ' increment the DisplayBuffer index as we insert spaces counter++ 'increment the Buffer Index (now passed the LineFeed onto next character) counter4:=0 'reset out CharactersOnLine counter back to 0..we have ended this line other: ' Just to catch some unusual chars..we may end up getting them, we will treat as a standard char with a funny display to catch it buffer[noparse][[/noparse]counter2] := 127 ' counter++ 'increment the Buffer index counter2++ 'increment the DisplayBuffer index counter4++ 'increment the CharacterOnLine counter ' Now we need to see if we just put the last character on a line that will fit if counter4 == (DisplayWidth-2) repeat until sdcard.vbpeek(0,addr+counter) == 13 ' we search until we find an EOL in the Buffer, the displaybuffer is already lines up for the next line counter++ if sdcard.vbpeek(0,addr+counter+1) == 10 'quick check to see if we have a linefeed (most likely) counter++ 'if we do increment the Buffer index to skip over this we are now lined up on it counter++ 'increment the Buffer index counter4:=0 'reset the CharacterOnLine counter
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
Post Edited (CassLan) : 1/2/2010 11:23:13 PM GMT
Comments
There are a few places where the optimisation (above) on buffer[noparse][[/noparse]counter2++] := xxxx can be used. There is a specific optimisation (shortcut) in the spin interpreter that does not use an extra instruction to do pre & post incrementing, although it does depend on how it is being used.
Since you are doing a sdcard.vbpeek quite often, this could also be a place to look.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
I think you should rearrange this code to take that into account. That is, when "skipping over" anything or "scanning" for anything
or "copying" anything, you are just going character by character forward; keep a pointer to a 512-byte buffer and a count of how
many characters "remain" in the buffer to be consumed, and this way you can eliminate more than 99% of the vbpeek() calls.
If you have the RAM the way I'd approach it is to have a buffer large enough to hold the maximum number of characters you can ever display on the screen (or, if you don't have that much RAM, on a line, though that will slow things down and complicate them a bit).· Have a function that uses block accesses to load the buffer, then scan the buffer to draw the screen with word wrap and all that.
On the Prop I'd also concentrate on using my control logic to identify and locate words, but once i'd identified·and located a word I'd use bytemove (which is way faster than byte-by-byte access in Spin) to actually stuff it into the video buffer.
I was thinking that the pre-evaluation might be helpfull, also the ++ inline with the command as well, Its easier for me to read the way it is now, so I will try that and see what kind of benefits it yeilds.
rokicki:
Hmmmm, I like what your saying, I don't think though that every vbpeek call transfers 512 bytes from the card, I believe there is some logic in that driver that has its own sector sized buffer which is checked in case what your looking for is in the current sector that's in memory, its essentially exactly what your talking about, I will ask MagIO about that.
localroger:
Gotcha. I will see about that approach.
Thanks for the input, I need to do some timing tests, but I'm pretty sure the major delay is in the displaying.
Will keep you posted!
Rick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
And yes, you are right. The vMem has a 512 byte buffer per slot and only reloads if necessary. I implemented the slots with your editor in mind and suggest to use 1 slot for RHB, 1 slot for LHB and 1 slot for filling the display buffer. (The other one can then be used for copy and paste).
I guess your lines can be longer than the number of characters of a row?
So you can have up to 3 sectors of data displayed on your screen.
I'll have a closer look at your code this evening.·
Thanks,
Rick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
·
is equivalent to
The latter should be a lot faster.
-Phil
At first I was using the vMem as the display buffer, which does not have a bytefill function, then I switched it to use RAM and missed that.
Thanks!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
·
The goal being to read the pre-filled GAP buffer which contains file contents as-is with the exception of being broken into two halves.
When I initially started this thread I was doing the following:
A) Filling a byte Display Buffer which is sized to fit the editing window with the contents of the GAP Buffer, taking into account EOLs, lines being longer than the screen..etc
Taking that Display Buffer and feeding it to the display driver once character at a time
The results:
A ~ 19mil clocks, B ~ 12mil clocks, total ~ 31mil
Then I tweaked B..Feeding Lines instead of characters...B ~ 7mil, total ~ 26mil
I further tweaked B..Opting to directly edit the Screen Buffer memory instead of using the Display Driver functions...B ~ 3.5mil, total ~ 22.5mil
At this point it occured to me to be done with·the usage of my own Display Buffer, and have (A) just fill the Display Drivers screen memory directly...A ~ 23mil, B~0, total~23mil
The performance increase was really none, except that I freed up usage of my general purpose buffer, which I was keeping pretty small until I wanted to use it as a display buffer.
So Now I was able to shrink that back down to 256 from 1240..almost 1kB of RAM freed!
But I really wasn't happy with the results..because at this point the screen wouldn't actually update very fast, AND you could see it writing out the chars instead of it appearing to just be the whole screen updating at once.
So, now that I had a free general purpose buffer at my disposal, and since it was suggested (thanks rokicki and localroger ) I used that as the input for (A) instead of the direct calls to MagIO's vmfunctions, refilling the buffer as needed.
And the results are: 10mil clocks!!!!
So in Short I started at 31mil...and ended up at 10mil + 984bytes lighter!!!
Thanks for everyones suggestions, I have a feeling I can get this down to 6mil... [noparse]:)[/noparse]
· Rick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
~440,000 clocks!!!·· Thats over 180 possible refreshes per second!!!
And that includes reading all the data for a whole screen from the GAP buffer (sd card·via FSRW and MagIO's vMem functions), parsing/finding end of lines..horizontal offseting..etc!!!
It does everything that my spin code did just ALOT faster.
The cog starts with the program, and when it sees a certain variable in main ram equal 1, it knows that the buffer in main ram is ready for it and it just rips through it [noparse]:)[/noparse]
Then when its done it·sets a seperate variable in main ram so that spin can know that its done [noparse]:)[/noparse]
I have a feeling that screen display/refresh speed will no longer be an issue
I also have a sneaking suspicion that this code can be made even faster, but I'm happy with the speed as is
I have to thank Parallax for the awesome PASM Webinar: [url=http://www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-12-10-Webinar-[Full].mp4]http://www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-12-10-Webinar-[noparse][[/noparse]Full].mp4[/url]
Which kind of gave me a push to do it.
And It would NOT have been possible for me to write this without Aribas' Propeller Assembler Source-code Debugger: http://propeller.wikispaces.com/PASD
Its a nice, light weight, free tool, the instructions are good, and I was able to use it to help me within 30 seconds of downloading it. Thanks Ariba!!
Well lets see now,
Storage is not a problem, Speed is not a problem, Memory Management is not a problem...I guess I have no more excuses guys [noparse]:)[/noparse]
Till later,
Rick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NYC Area Prop Club
Prop Forum Search (Via Google)
·
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.