PDA

View Full Version : Preditor - Speed Help - PASM to the Rescue!!!



CassLan
12-27-2009, 10:16 PM
This is the routine that fills the DisplayBuffer with exactly what should be in the text editing window (ie whitespaces are " ").
It takes the Address of the start of a line, as well as the horizontal offset and fills buffer[].
It works well but is slow, granted this does the whole editing window, while it probably only needs to do one line at a time while editing, but I thought I would ask some people who are more experienced than myself (most of youhttp://forums.parallax.com/images/smilies/lol.gif ) if you see anything that would shave some clocks off this.


PUB DisplayBufferFill(addr,horizoffset)
' Fills the Display Buffer for the editing window as it will display on screen
counter := 0 'index of Buffer array
counter2 := 0 'index of DisplayBuffer
counter4 := 0 'current cursor location on that line
counter5 := 0 'horizoffset counter
counter6 := 0 'horizontal EOL Flag
'
' Lets start by determining what we COULD be running into
' With Horizontal Offset of 0
' 1) A single Character..followed by more characters
' 2) A single Character..followed by an EOL (Carriage Return)
' 3) An EOL..followed by more characters
' 4) An EOL..followed by another EOL
'
' With a Horizontal Offset of some value
' 5) Skipping over Characters..to Display a single character followed by more characters
' 6) Skipping over Characters and EOL .. to Display nothing on that line
' 7) Skipping over EOL..to display nothing on that line
'
repeat until counter2 == DisplayBufferSize ' do this until we have filled out display buffer
counter6 := 0 'reset horiz EOL Flag
If HorizOffset <> 0 and counter4 == 0
'we have a horizoffset value to consider and we are at the beginning of the line
'We are skipping over characters..we need to check what they are
counter5:=0
repeat until counter5 == horizoffset ' do this for every horizontal offset value (character we are skipping)
case sdcard.vbpeek(0,addr+counter)
32..126: 'we are skipping standard characters
counter++ 'increment buffer index
counter5++ 'increment horizoffset counter
13: ' we have come across an EOL before we are displaying any chars on this line
if sdcard.vbpeek(0,addr+counter+1) == 10 'quick check to see if we have a linefeed (most likely)
counter++ ' if we do increment the Buffer index to skip over this we are now lined up on it
repeat ((DisplayWidth-2)-counter4) ' Fill the DisplayBuffer with spaces until the end of the display on that line
buffer[counter2]:=32
counter2++ ' increment the DisplayBuffer index as we insert spaces
counter++ 'increment the Buffer index (passed the linefeed/EOL)
counter4:= 0'reset the CharactersOnLine counter (should,have been 0 anyway)
'we should exit this loop at this point
counter5 := horizoffset ' we will no longer display characters on this line
counter6 := 1 'set this flag to skip the line char render since we are at a new line now
other: ' for odd chars, we will treat just like standards for now
counter++ 'increment buffer index
counter5++ 'increment horizoffset counter

If counter6 == 0
case sdcard.vbpeek(0,addr+counter)
32..126: 'standard character
buffer[counter2] := sdcard.vbpeek(0,addr+counter) 'place this chacter value in the Display Buffer
counter++ 'increment the Buffer index
counter2++ 'increment the DisplayBuffer index
counter4++ 'increment the CharacterOnLine counter
149: 'standard character
buffer[counter2] := 15 'place this chacter value in the Display Buffer
counter++ 'increment the Buffer index
counter2++ 'increment the DisplayBuffer index
counter4++ 'increment the CharacterOnLine counter
13:
if sdcard.vbpeek(0,addr+counter+1) == 10 'quick check to see if we have a linefeed (most likely)
counter++ ' if we do increment the Buffer index to skip over this we are now lined up on it
repeat ((DisplayWidth-2)-counter4) ' Fill the DisplayBuffer with spaces until the end of the display on that line
buffer[counter2] := 32
counter2++ ' increment the DisplayBuffer index as we insert spaces
counter++ 'increment the Buffer Index (now passed the LineFeed onto next character)
counter4:=0 'reset out CharactersOnLine counter back to 0..we have ended this line
other: ' Just to catch some unusual chars..we may end up getting them, we will treat as a standard char with a funny display to catch it
buffer[counter2] := 127 '
counter++ 'increment the Buffer index
counter2++ 'increment the DisplayBuffer index
counter4++ 'increment the CharacterOnLine counter
' Now we need to see if we just put the last character on a line that will fit
if counter4 == (DisplayWidth-2)
repeat until sdcard.vbpeek(0,addr+counter) == 13 ' we search until we find an EOL in the Buffer, the displaybuffer is already lines up for the next line
counter++
if sdcard.vbpeek(0,addr+counter+1) == 10 'quick check to see if we have a linefeed (most likely)
counter++ 'if we do increment the Buffer index to skip over this we are now lined up on it
counter++ 'increment the Buffer index
counter4:=0 'reset the CharacterOnLine counter


▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


NYC Area Prop Club (http://www.gothampropclub.com)

Prop Forum Search (Via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)



Post Edited (CassLan) : 1/2/2010 11:23:13 PM GMT

Cluso99
12-27-2009, 11:03 PM
Casslan: From my understanding of the implementation of spin, expressions like this can be sped up...


repeat ((DisplayWidth-2)-counter4) ' Fill the DisplayBuffer with spaces until the end of the display on that line
buffer[counter2]:=32
counter2++ ' increment the DisplayBuffer index as we insert spaces


-----------------------------------------------------

tmp := (DisplayWidth-2)-counter4 ' faster to preevaluate ???? (unsure)
repeat tmp ' Fill the DisplayBuffer with spaces until the end of the display on that line
buffer[counter2++]:=32 ' postincrement




There are a few places where the optimisation (above) on buffer[counter2++] := xxxx can be used. There is a specific optimisation (shortcut) in the spin interpreter that does not use an extra instruction to do pre & post incrementing, although it does depend on how it is being used.

Since you are doing a sdcard.vbpeek quite often, this could also be a place to look.





▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade (http://forums.parallax.com/showthread.php?p=786418),·RamBlade (http://forums.parallax.com/showthread.php?p=849265),·SixBlade (http://forums.parallax.com/showthread.php?p=780033), website (http://bluemagic.biz/cluso.htm)
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator) (http://forums.parallax.com/showthread.php?p=790917)
· Prop Tools under Development or Completed (Index) (http://forums.parallax.com/showthread.php?p=753439)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) (http://forums.parallax.com/showthread.php?p=778427) ZiCog (Z80) (http://forums.parallax.com/showthread.php?p=788511) , MoCog (6809) (http://forums.parallax.com/showthread.php?p=811043)
· Search the Propeller forums (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz (http://www.bluemagic.biz)·· MultiBladeProp is: www.bluemagic.biz/cluso.htm (http://www.bluemagic.biz/cluso.htm)

rokicki
12-28-2009, 02:07 AM
Based on the way you are using virtual memory, and the knowledge that the SD card always moves chunks of 512 bytes in and out,
I think you should rearrange this code to take that into account. That is, when "skipping over" anything or "scanning" for anything
or "copying" anything, you are just going character by character forward; keep a pointer to a 512-byte buffer and a count of how
many characters "remain" in the buffer to be consumed, and this way you can eliminate more than 99% of the vbpeek() calls.

localroger
12-28-2009, 06:02 AM
To expand on what rokicki said ... this is a function I've coded at least 4 times since the first time I did it for a C64.· The way you did it has the advantage of being simple and easy to debug with small code size, but it is·very slow.

If you have the RAM the way I'd approach it is to have a buffer large enough to hold the maximum number of characters you can ever display on the screen (or, if you don't have that much RAM, on a line, though that will slow things down and complicate them a bit).· Have a function that uses block accesses to load the buffer, then scan the buffer to draw the screen with word wrap and all that.

On the Prop I'd also concentrate on using my control logic to identify and locate words, but once i'd identified·and located a word I'd use bytemove (which is way faster than byte-by-byte access in Spin) to actually stuff it into the video buffer.

CassLan
12-28-2009, 11:04 AM
Cluso:
I was thinking that the pre-evaluation might be helpfull, also the ++ inline with the command as well, Its easier for me to read the way it is now, so I will try that and see what kind of benefits it yeilds.

rokicki:
Hmmmm, I like what your saying, I don't think though that every vbpeek call transfers 512 bytes from the card, I believe there is some logic in that driver that has its own sector sized buffer which is checked in case what your looking for is in the current sector that's in memory, its essentially exactly what your talking about, I will ask MagIO about that.

localroger:
Gotcha. I will see about that approach.

Thanks for the input, I need to do some timing tests, but I'm pretty sure the major delay is in the displaying.
Will keep you posted!

Rick

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


NYC Area Prop Club (http://www.gothampropclub.com)

Prop Forum Search (Via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)

·

Cluso99
12-28-2009, 11:13 AM
CassLan: I don't like unreadable code either, but sometimes it is a must, so good comments are required (like post increment)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade (http://forums.parallax.com/showthread.php?p=786418),·RamBlade (http://forums.parallax.com/showthread.php?p=849265),·SixBlade (http://forums.parallax.com/showthread.php?p=780033), website (http://bluemagic.biz/cluso.htm)
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator) (http://forums.parallax.com/showthread.php?p=790917)
· Prop Tools under Development or Completed (Index) (http://forums.parallax.com/showthread.php?p=753439)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) (http://forums.parallax.com/showthread.php?p=778427) ZiCog (Z80) (http://forums.parallax.com/showthread.php?p=788511) , MoCog (6809) (http://forums.parallax.com/showthread.php?p=811043)
· Search the Propeller forums (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz (http://www.bluemagic.biz)·· MultiBladeProp is: www.bluemagic.biz/cluso.htm (http://www.bluemagic.biz/cluso.htm)

MagIO2
12-28-2009, 09:13 PM
You already mentioned the best improvement: Only update what needs to be updated.

And yes, you are right. The vMem has a 512 byte buffer per slot and only reloads if necessary. I implemented the slots with your editor in mind and suggest to use 1 slot for RHB, 1 slot for LHB and 1 slot for filling the display buffer. (The other one can then be used for copy and paste).

I guess your lines can be longer than the number of characters of a row?
So you can have up to 3 sectors of data displayed on your screen.

I'll have a closer look at your code this evening.·

CassLan
12-29-2009, 09:47 AM
I shaved the display portion (for an entire editing window) from 12mil to 7mil. I'm pretty sure that updating the minimum amount of data AND display is all that needs to be done. I will keep all posted.

Thanks,

Rick

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


NYC Area Prop Club (http://www.gothampropclub.com)

Prop Forum Search (Via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)

·

Phil Pilgrim (PhiPi)
12-29-2009, 11:38 AM
repeat ((DisplayWidth-2)-counter4) ' Fill the DisplayBuffer with spaces until the end of the display on that line
buffer[counter2]:=32
counter2++ ' increment the DisplayBuffer index as we insert spaces




is equivalent to




bytefill(@buffer[counter2], 32, DisplayWidth - 2 - counter4)
counter2 += DisplayWidth - 2 - counter4 'This line may not be necessary, depending on what the subsequent code expects.




The latter should be a lot faster.

-Phil

CassLan
12-29-2009, 08:08 PM
Of course Phil!!!
At first I was using the vMem as the display buffer, which does not have a bytefill function, then I switched it to use RAM and missed that.

Thanks!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


NYC Area Prop Club (http://www.gothampropclub.com)

Prop Forum Search (Via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)

·

CassLan
12-31-2009, 09:46 PM
I have made progress and wanted to share some of the re-implementations I have gone through along with the results:

The goal being to read the pre-filled GAP buffer which contains file contents as-is with the exception of being broken into two halves.

When I initially started this thread I was doing the following:
A) Filling a byte Display Buffer which is sized to fit the editing window with the contents of the GAP Buffer, taking into account EOLs, lines being longer than the screen..etc
B) Taking that Display Buffer and feeding it to the display driver once character at a time
The results:
A ~ 19mil clocks, B ~ 12mil clocks, total ~ 31mil

Then I tweaked B..Feeding Lines instead of characters...B ~ 7mil, total ~ 26mil
I further tweaked B..Opting to directly edit the Screen Buffer memory instead of using the Display Driver functions...B ~ 3.5mil, total ~ 22.5mil

At this point it occured to me to be done with·the usage of my own Display Buffer, and have (A) just fill the Display Drivers screen memory directly...A ~ 23mil, B~0, total~23mil
The performance increase was really none, except that I freed up usage of my general purpose buffer, which I was keeping pretty small until I wanted to use it as a display buffer.
So Now I was able to shrink that back down to 256 from 1240..almost 1kB of RAM freed!

But I really wasn't happy with the results..because at this point the screen wouldn't actually update very fast, AND you could see it writing out the chars instead of it appearing to just be the whole screen updating at once.

So, now that I had a free general purpose buffer at my disposal, and since it was suggested (thanks rokicki and localroger ) I used that as the input for (A) instead of the direct calls to MagIO's vmfunctions, refilling the buffer as needed.
And the results are: 10mil clocks!!!!

So in Short I started at 31mil...and ended up at 10mil + 984bytes lighter!!!

Thanks for everyones suggestions, I have a feeling I can get this down to 6mil... :)

http://forums.parallax.com/images/smilies/smile.gif· Rick

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


NYC Area Prop Club (http://www.gothampropclub.com)

Prop Forum Search (Via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)

·

Baggers
12-31-2009, 10:55 PM
Nice going Rick, Keep up the great work :)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite

·

CassLan
01-03-2010, 06:20 AM
I'm not sure why it took me so long to try it, but I wrote my first PASM code today to help with the speed, and it brought the parsing/display speed from ~10 mil clocks (after sweating for days with it) to an astounding ....

~440,000 clockshttp://forums.parallax.com/images/smilies/hop.gif!!!·· Thats over 180 possible refreshes per second!!!

And that includes reading all the data for a whole screen from the GAP buffer (sd card·via FSRW and MagIO's vMem functions), parsing/finding end of lines..horizontal offseting..etc!!!
It does everything that my spin code did just ALOT faster.

The cog starts with the program, and when it sees a certain variable in main ram equal 1, it knows that the buffer in main ram is ready for it and it just rips through it :)

Then when its done it·sets a seperate variable in main ram so that spin can know that its done :)

I have a feeling that screen display/refresh speed will no longer be an issue http://forums.parallax.com/images/smilies/cool.gif
I also have a sneaking suspicion that this code can be made even faster, but I'm happy with the speed as is http://forums.parallax.com/images/smilies/lol.gif

I have to thank Parallax for the awesome PASM Webinar: .mp4]http://www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-12-10-Webinar-[Full].mp4 (http://www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-12-10-Webinar-[Full)
Which kind of gave me a push to do it.
And It would NOT have been possible for me to write this without Aribas' Propeller Assembler Source-code Debugger: http://propeller.wikispaces.com/PASD
Its a nice, light weight, free tool, the instructions are good, and I was able to use it to help me within 30 seconds of downloading it. Thanks Ariba!!

Well lets see now,
Storage is not a problem, Speed is not a problem, Memory Management is not a problem...I guess I have no more excuses guys :)

Till later,

Rick







▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


NYC Area Prop Club (http://www.gothampropclub.com)

Prop Forum Search (Via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)

·

Oldbitcollector (Jeff)
01-03-2010, 08:51 AM
Sounds Great Rick!!!!! Looking forward to Preditor!

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Visit the: The Propeller Pages (http://www.warrantyvoid.us/tiki-index.php?page=Propeller) @ Warranty Void.

mpark
01-03-2010, 08:51 AM
Kudos, Rick! That's great news.