Video Processing (going the other way)

Wurlitzer · 2006-06-03 13:11

Anybody been working on using this great Micro to process images from a video source rather than generate them?

I have an application that would have to look at high contrast flags oriented as, 2 rows of 65, and determine one of 3 vertical positions. Each flag would have·one flag-width space between so horizontal separation should be sufficient and I would make sure the vertical displacement would move the image at least 5 horizontal scan lines up or down. I don't need to know the exact vertical position just that it moved far enough to reach one of the 3 states.

·IE: If (Position1=< 15, Position2=15 to 19, Position3=> 20) and the top of
Flag#42 is at horizontal scan line #12· it is position 1

I was thinking of making the flag height tall enough to allow the video processing to self clock. In other words, no matter what vertical position the flags are·in, there would be a horizontal scan line number where all flags for a row would be seen and therefore counted. In the example above maybe scan line 16 would see all the flags.

I would like to process the image ideally at an interleaved rate of ·60/second or worse case a full screen rate of 30/second. Any slower the application would not work.

What does the brain trust think?
·

cgracey · 2006-06-04 04:33

Check out Phil Pilgrim's threads. He's been working on this type of thing.

With 32K of RAM, there's not a lot of buffer space for something as rich as·a camera image, as you are aware. If some kind of object detection could be performed as the data was input, which is what you are proposing, the buffering req's could be reduced by a lot. This would make it all possible and practical. I think you could achieve this.

Wurlitzer said...
Anybody been working on using this great Micro to process images from a video source rather than generate them?

I have an application that would have to look at high contrast flags oriented as, 2 rows of 65, and determine one of 3 vertical positions. Each flag would have·one flag-width space between so horizontal separation should be sufficient and I would make sure the vertical displacement would move the image at least 5 horizontal scan lines up or down. I don't need to know the exact vertical position just that it moved far enough to reach one of the 3 states.

·IE: If (Position1=< 15, Position2=15 to 19, Position3=> 20) and the top of
Flag#42 is at horizontal scan line #12· it is position 1

I was thinking of making the flag height tall enough to allow the video processing to self clock. In other words, no matter what vertical position the flags are·in, there would be a horizontal scan line number where all flags for a row would be seen and therefore counted. In the example above maybe scan line 16 would see all the flags.

I would like to process the image ideally at an interleaved rate of ·60/second or worse case a full screen rate of 30/second. Any slower the application would not work.

What does the brain trust think?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Phil Pilgrim (PhiPi) · 2006-06-04 06:07

Chip's right. I have been working on this kind of thing -- for more than 20 years!

Machine vision, as the field is called, is hard. Yet calling it hard is a gross generalization that encompasses things like 3D navigation, bin picking of randomly-oriented parts, and face recognition, to name some of the real stinkers. But the field also includes much simpler things, such as sizing, position and orientation sensing, and barcode reading.

My first vision app was for a huge sorting machine that measured the diameter of dwarf fruit tree stock at a nursery and dropped the trees into bins, based on size. It used a Z80 and captured a 256x256x6-bit frame from a B/W TV camera. The video buffer required 64K of RAM, but the processing requirements were pretty modest, and the Z80 could keep up easily. From there I specialized in linescan imagers, which are cameras with a single row of pixels -- in my case 256x1 or 128x1. Linescan imagers are the things to use when the subjects being observed are moving, such as on a production line. That way, successive images can be acquired to fill in the second dimension missing from the sensor itself. And, as Chip suggests, processing on the fly is the way to eliminate huge buffer requirements. These sensors were (and still are) being used in the produce-packing industry for sizing fruit and vegatables. Other applications include detecting orientation of empty liquor bottles prior to filling (so the label gets put on the right side) and measuring the widths of boards going through an automatic saw. These apps use PIC microcontrollers -- again, nothing fancy.

One of the biggest hurdles to overcome in machine vision apps is lighting. If you can control the lighting and optimize it for the type of sensing you're doing, you've done 90% of the work and made the other 10% easier. Seriously. There are many resources on the web that discuss lighting for machine vision. One of the better introductions can be found here: dolan-jenner.com/jenner/equipment/guide.asp.

Now, how does the Propeller fit into all of this? The Propeller has some unique characteristics that make it suitable for some machine vision apps. Being able to display what it's looking at, given some sort of image input, is a big plus for debugging. Granted, 32K of RAM isn't enough to hold a frame of VGA (640x480) or even CIF (352x288). But you don't have to for a large class of useful vision apps. A lot can be accomplished either at coarser resolutions or by computing on the fly. One thing I expect the Propeller, with its multiple processors, to shine at is image data processing. But I haven't yet gotten that far in my investigations.

I expect to have a lot more to say on the subject in the near future. For now, suffice it to say that the Propeller looks like a good candidate to form the foundation of a modest, but extremely useful, machine vision system.

-Phil

Wurlitzer · 2006-06-04 14:09

Thanks Chip and Phil! I have implemented numerous vision systems and everything you stated is 100% true. They are good a finding something somewhere and terrible at locating anything anywhere. And yes lighting is very important. I have had great success using polarized light in many applications.

In my application, the flags will always be in a fixed horizontal position and have 3 possible vertical positions. The background will be solid black and the flags white.

The processing requirements would be as follows:

Determine the end of the vertical sync pulse

Count the horizontal scan lines

Determine position within a single horizontal scan where the flags should appear. This might be hard coded or it would be great to be self detecting.

At the expected horizontal position determine if the video image is white for
Flag(FlagPositionCounter) at one of the 3 possible scan count values.

If I see white for example at HorzScanLine 5/6 the flag is at position 12, @ 10/11 position 2, @ 15/16 position 3. (I used 2 scan lines to account for interleaved scanning)

Set a FlagArray (0-129) to a value of 1,2,3, or with 32 bits, this array could hold both the anticipated horizontal position and vertical position.

In my application this would eliminate 260 switch contacts (other similar apps require 520) which have always been problematic regardless of switch type. The industry has tried, reed, hall effect, IR, phosphor bronze wire and shorting bars etc. All have failed from time to time due to different circumstances like temperature, humidity, contact corrosion etc.

The ability to auto correct in software for physical changes would be a huge plus.

Once the full documentation for assembly programming is available I can start to give this some serious attention.

Wurlitzer · 2006-06-04 14:16

Error <i>If I see white for example at HorzScanLine 5/6 the flag is at position 12,</i> Should be position 1. Fat fingers.

Cliff L. Biffle · 2006-06-13 14:30

As a bit of a counterpoint, I'm looking for something to replace my CMUcam vision processor, which is based around the SX52. 32K of RAM seems like an absolute bounty compared to the SX52 -- with an entire 8bpp QCIF framebuffer weighing in at under 25K, I might even be able to fit one in RAM.

I'm considering doing line-oriented processing, as the previous posters have suggested, but this creates some problems.
1. Processing has to happen during a single horizontal scan line (or a small multiple thereof, if I use line buffers), if I don't want to pull lines from successive frames and risk tearing.
2. No way to do multipass processing of the image.

In either case, to do multi-cog video processing (which is really why I want the chip), I'm a wee bit concerned about RAM access latency.

Are there any reasonably speedy ways to attach some RAM to the current Propellers? Are there plans for Propellers with more shared RAM?

And where the hell's the rest of the manual?

Wurlitzer · 2006-06-13 14:50

Cliff, I had though about having the Propeller generate the scan signals so I would not have to detect vert/horz sync pulses and count lines. The Propeller would already know what line it was processing.

An external·flash A/D would also lighten the load significantly.

Depending upon the horizontal resolution your application requires it is possible the internal ram would be sufficient for line by line processing if it did not have to first determine where is was in the scan.

I agree on the manual issue. I cannot begin to work on this until I have a good handle on the assembly language required and hopefully a chart depicting number of CPU cycles for each instruction.

Cliff L. Biffle · 2006-06-14 15:09

Wurlitzer said...
Cliff, I had though about having the Propeller generate the scan signals so I would not have to detect vert/horz sync pulses and count lines. The Propeller would already know what line it was processing.

I just found the section in my camera module's docs that specify how to do that. (I'm using an OV6620.) It might be workable, but the docs are a little fuzzy on how long I can hold the row charges without losing image quality, etc.

Since the camera's already generating digital output, I'm also considering building a framebuffer driven directly by its outputs, and having the processor read from that -- but that introduces a frame of latency.

Wurlitzer said...
Depending upon the horizontal resolution your application requires it is possible the internal ram would be sufficient for line by line processing if it did not have to first determine where is was in the scan.

Indeed; I'm working at CIF, so 288 lines of 352 pixels. At 16 bits lum/chroma, each line would take 176 longs; I can window the sensor smaller, or subsample if necessary -- but I'd like to avoid it.

If the individual cogs prove too slow or too RAM-constrained, I can always interleave the I/O functions between two of them.

I built a small simulation of real-time line processing on an architecture like the Propeller, and it seems to work. I'm not doing that much work -- frame differencing for motion detection, some simple color tracking, basically what the SX28 in the CMUcam does. Thanks to the tan and log tables in ROM, I may also be able to do my laser surface topography calculations entirely in the Propeller; they're already fixed-point and avoid division.

(Incidentally: Parallax, am I gonna have to implement Forth for this thing? We need a low-level compiled language.)

However:

Wurlitzer said...
I agree on the manual issue. I cannot begin to work on this until I have a good handle on the assembly language required and hopefully a chart depicting number of CPU cycles for each instruction.

This is the crux here. Without precise instruction timing information, I'm stuck with loose estimates gleaned from other chapters of the manual (22 cycles for an out-of-sync Hub instruction, 7 cycles in sync, 4 cycles for basic instructions, etc). I have no idea what the branch latencies are -- even the PICs have pipelines to stall -- so I'm writing without them.

Hell, at this point, I don't even know for sure that all instructions fit in 32 bits, so I have no idea if the routines I'm designing would fit in cog RAM.

I'm holding off buying Propeller equipment until I can get this info. I admit to having a real soft-spot for weird architectures, and I'm known for tight hand-optimization, but without a good instruction set reference? I'd rather not reverse-engineer my micro.

cgracey · 2006-06-14 17:22

Cliff,

All the info you need about the Propeller's assembly language is in this document:

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 6/14/2006 5:29:42 PM GMT

Cliff L. Biffle · 2006-06-14 19:54

Chip,

Rock on! This should be sufficient to finish my model.

Nice orthogonal instruction encoding, btw. I like the way instruction predication is handled, and several of the instructions (the MUX variants in particular) will save wear on my copy of Hacker's Delight.

From a tools perspective, of course, this document doesn't cover how to bootstrap ASM code on the chip, but I know y'all are pretty closed about that sort of thing. I'll see if I can borrow a Windows box.

Cliff L. Biffle · 2006-06-15 01:02

Chip,

Couple questions to fill the holes in that document:

1. So, short of self-modifying code, there's really no way to do indirect addressing within the Cog's local storage? (That is, indirection of the D field specifically?)

2. Are there docs on what CALL and RET expand to in the standard assembler? It strikes me as being something like
; CALL foo
jmpret foo_ret, foo

; RET
foo_ret jmp #0 ; S field modified on CALL

...but if it were that simple, I don't see why we'd need a macro, so I'm surely missing something.

3. Any docs on the WAITVID instruction?

4. How about the cycles from COGINIT/COGSTOP to the COG actually starting/stopping? (I'm sure it's a function of both cog numbers, but as long as I can predict it, it might well come in handy.)

cgracey · 2006-06-15 01:23

Cliff,
Just have the initial COG that runs your spin code launch a single ASM program into itself and you can be completely in assembly thereafter, with COGs launching and stopping other COGs, and the works. Once all your code is running in COGs, you can reuse the entire main RAM for whatever you want. There are no rules.

Cliff L. Biffle said...

From a tools perspective, of course, this document doesn't cover how to bootstrap ASM code on the chip, but I know y'all are pretty closed about that sort of thing. I'll see if I can borrow a Windows box.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Cliff L. Biffle · 2006-06-15 01:27

Chip Gracey said...
Just have the initial COG that runs your spin code launch a single ASM program into itself and you can be completely in assembly thereafter

*nod*

The tool support issue I was referring to, however, is the fact that I don't own (or even have access to) a Windows box, so simply launching it from SPIN code isn't an option. Hence my desire for information on the bootstrap sequence, so I could roll a binary and program the chip from one of my Macs, or one of the Linux boxen at work.

cgracey · 2006-06-15 01:28

Cliff L. Biffle said...
Chip,

Couple questions to fill the holes in that document:

1. So, short of self-modifying code, there's really no way to do indirect addressing within the Cog's local storage? (That is, indirection of the D field specifically?)

That's right.

2. Are there docs on what CALL and RET expand to in the standard assembler? It strikes me as being something like
; CALL foo
jmpret foo_ret, foo

; RET
foo_ret jmp #0 ; S field modified on CALL

You've got it, just remember to use a # if you mean immediate, which you almost always will.

...but if it were that simple, I don't see why we'd need a macro, so I'm surely missing something.

Having a 'CALL' and 'RET' just keeps you sane. Otherwise, you'd have to type a lot more and your intentions would get drowned in text.

3. Any docs on the WAITVID instruction?

Not yet, but Andre's Hydra Manual goes into depth about this. You can look at the VGA 512x384 bitmap that was posted on this forum and get a pretty good idea of what's going on. Also, VGA and TV drivers are good to look at.

4. How about the cycles from COGINIT/COGSTOP to the COG actually starting/stopping? (I'm sure it's a function of both cog numbers, but as long as I can predict it, it might well come in handy.)

It's just like a RDBYTE or WRBYTE instruction to the COG executing it. To the COG getting launched, he must go through 512 x 16 clocks to get loaded up, and then he starts executing. That takes·102.4us at 80 MHz.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Cliff L. Biffle · 2006-06-15 02:01

Hm, that's unfortunate about indirecting D. It'd be handy to set up an indexable array in local storage. At least MOVD/MOVS is cheap.

I'll go read the VGA sources. I have no need for VGA or composite output in my project at the moment, but when I pick up a Propeller board I'll sure play with it some.

As for COGSTOP, are there any guarantees of how many cycles after COGSTOP the targetted COG stops? (Scarily enough, I'm actually thinking of using this for some state control in the I/O routine, since I can't squeeze enough cycles out of the critical path to actually include a "stop" mechanism.)

cgracey · 2006-06-15 02:05

The other COG is stopped immediately, even before the one executing the COGSTOP instruction gets to its next instruction.

Cliff L. Biffle said...

As for COGSTOP, are there any guarantees of how many cycles after COGSTOP the targetted COG stops? (Scarily enough, I'm actually thinking of using this for some state control in the I/O routine, since I can't squeeze enough cycles out of the critical path to actually include a "stop" mechanism.)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Paul Baker · 2006-06-15 13:59

Cliff L. Biffle said...
Hm, that's unfortunate about indirecting D. It'd be handy to set up an indexable array in local storage. At least MOVD/MOVS is cheap.

I have successfully used the ADD instruction on the target instruction to do sequential array indexing, its fast and simple. Just be sure to have at least one instruction between the ADD and the target instruction's use. The following works like a charm:

:storeloop    mov       fbufstart, ina                  'store pins state
              add       :storeloop, :d_inc              'inc dest in instruction above
              djnz      :i, #:storeloop                 'go for next transition
 
 
:d_inc        LONG      $0000_0200

What I find especially nice is the buffer's address is preinitialized. This can also work in updating both the source and destination address simultaneously.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Post Edited (Paul Baker) : 6/15/2006 2:03:55 PM GMT

SSteve · 2006-06-15 16:52

Paul:

Before :storeloop do you need to do movd :storeloop, fbufstart? Otherwise it seems that if you hit that loop multiple times the destination would always start wherever it last left off.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
OS-X: because making Unix user-friendly was easier than debugging Windows

links:
My band's website
Our album on the iTunes Music Store

Paul Baker · 2006-06-15 17:57

You would if you need to reuse the loop. In this particular application (logic analyzer) it only executes it once, dumps the buffer to main memory, then kills itself. I believe it needs to be movd :storeloop, #fbufstart to work, otherwise the lower nine bits stored at address fbufstart would be the new address.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Cliff L. Biffle · 2006-06-15 19:09

Teehee, Paul's being Sneaky. For the less geeky:

The d_inc constant, 0x200, contains a 1 in the low-order bit of the D field. Thus, the instruction

add :storeloop, :d_inc

...adds a literal 0x200 to the instruction at :storeloop, incrementing the D field.

Of course, if you did this 512 times, it'd overflow, which would convert the instruction to something along the lines of

never add :storeloop, address-of-d_inc-here   wc nr

In other words, it becomes a nop, which would be quite a surprise. Paul, I assume your buffer is under 512 words?

(Incidentally, Paul, your code is quite a bit tighter than my attempt at this same thing. I've been trying to get a read-and-fill loop under eight instructions for 16-bit data packed two per word.)

Cliff L. Biffle · 2006-06-15 19:39

So it appears that reading the full 16-bit data stream from the OV6620 should be possible without the use of a framebuffer.

With one COG, the camera will have to be clocked at or around 13MHz (16-bit output is at clk/2, 12 cycles per looped read).

I'm concerned that writing the line out to shared RAM (5,296-5,311 cycles for CIF) looks to take longer than an entire line interval for the camera. One might need two COGs, taking turns reading lines. (The shared RAM write time could be reduced if I could pack samples two-per-long as they're being read, which would likely require COGs alternating on each sample.)

The good news is, buffering a single line with packed 16-bit samples will only take 176 words of shared RAM.

What's really killing me here, besides the high latencies for shared RAM, are the four-cycle non-pipelined instructions. The raw power of the Propeller beats the SX28, sure, but the SX28 benefits from single-cycle I/O instructions and the traditional PIC-style four-stage pipeline.

Paul Baker · 2006-06-15 21:37

I hadn't analyzed what would happen if 512 were exceeded, I have semi-automatic sizing that takes care of the proper value for i. Problems would occur before 512 were reached because the last 16 longs are the special registers, depending on what you are writing, you could be turning the counters on to strange settings, as well as the video. Data packing slows things down quite a bit, I have an alternate version that stores a counter timestamp along with the data and if I remember correctly it is a 5 instruction loop. Doing sub-long manipulation would likely require unrolling the loop into its constituent parts (high word/low word with a shift instruction between). Ive found that judiciously placing the pointer increment ADD eliminates the need for a NOP, unless you have an odd number of instructions.

Writing to hub memory, especially in blocks can be quite time consuming. The biggest rub is that you need to update the source and destination address (wrlong's destination is by pointer only so the trick I used above doesn't work), as well as the loop instruction, this means you have 3 instructions between wrlong's so you end up missing the next availible hub slot. If you are fetching values off the bus, it may be faster in the long run to write it directly to the hub memory. Doing a "WRLONG from INA/increment hub address/DJNZ" allows you to catch every hub rotation, then another cog could go through and pick out the data and compact it, whether the compacting cog could keep up is another question, but here is where you might use the alternating cogs for each scan line.

Thats what I love about the Propeller, if you are imaginative enough you can find these cool little tricks to squeeze more performance out of it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Post Edited (Paul Baker) : 6/15/2006 9:51:04 PM GMT

Cliff L. Biffle · 2006-06-15 21:52

Paul Baker said...

Problems would occur before 512 were reached because the last 16 longs are the special registers

Well yes.

Considering that your code's rooted at 0x000, you'll likely have to start higher than that as well.

Paul Baker said...

Writing to hub memory, especially in blocks can be quite time consuming. The biggest rub is that you need to update the source and destination address (wrlong's destination is by pointer only so the trick I used above doesn't work), as well as the loop instruction, this means you have 3 instructions between wrlong's so you end up missing the next availible hub slot.

Yes, my current code assumes you can fit the shared buffer addresses within an immediate, so it'd have to reside entirely within the first 512 bytes of RAM. I can only do this with some quantization and similar nastiness. But, assuming one can do the code shuffling to pull this off, it's a simple add of 0x201.

Paul Baker · 2006-06-16 00:28

We asked about using that mode in the beta group, and we figured that in general that its best to avoid that mode because the spin environment does not provide you precise control of data allocation. It must be the very first thing declared in the highest level object and that behavior is not guaranteed (ie a new version of the Tool make change the way it's done). For my app it was out of the question because one of the goals is being able to incorporate it in as a lower object.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Cliff L. Biffle · 2006-06-16 01:38

I'm not interested in the SPIN environment for a couple reasons, the most immediate of which is that I don't keep Windows boxes around.

I'm collecting the information I need to roll my own binaries and program them onto the chip. This'll mean enough SPIN to bootstrap the assembler code, and no more. In this case, the assembler can come up, jump into a routine in high RAM, and clear the lower chunk from there, possibly paging in more code in the process.

In the longer term, I hope to target a compiler to the chip, but we'll see. (I've already got an assembler, from the data Chip sent yesterday, but of course my "binaries" lack the necessary bootstrap preamble.)

Does this assembler not let you use directives to control where your code is loaded in RAM? Unfortunate. That's a pretty standard tool.

Paul Baker · 2006-06-16 03:43

If your plan is to ditch spin then you can probably get it to work, it may take a bit of care but shouldn't be very difficult. I think doing a simple VAR LONG res[noparse][[/noparse]512] at the top would work.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Cliff L. Biffle · 2006-06-16 05:11

Naw, depending on where the SPIN interpreter starts execution, the bottom end of RAM is likely occupied initially.

Besides, it'd have to be VAR LONG res[noparse][[/noparse]128] -- longs are byte-addressed, so only 128 of them are addressable by a literal.

But other than that.

cgracey · 2006-06-16 06:28

Cliff,

You should·understand that we're not in love with Windows, either. It's just a fact of life for most of us. The ultimate goal of future Propeller chips is to completely stand alone so that NO box is necessary.

In the meantime, if you want to make a bootstrap loader for the Propeller, this is the only preamble you need:

Minimal Spin bootstrap code for assembly language launch

$0000: HZ HZ HZ HZ CR CS 10 00 LL LL 18 00 18 00 10 00
$0010: FF FF F9 FF FF FF F9 FF 35 37 04 35 2C -- -- --

$0020: your assembly code starts here - loaded into COG #0

elaboration:

$0000: HZ HZ HZ HZ - internal clock frequency in Hz (long)
$0004: CR········· - value to be written to clock register (byte)
$0005: CS········· - checksum so that all RAM bytes will sum to 0 (modulus 256)
$0006: 10 00······ - 'pbase' (word) must be $0010
$0008: LL LL······ - 'vbase' (word) number of longs loaded times 4
$000A: 18 00······ - 'dbase' (word) above where $FFF9FFFF's get placed
$000C: 18 00······ - 'pcurr' (word) points to Spin code
$000E: 10 00······ - 'dcurr' (word) points to local stack

$0010: FF FF F9 FF - below local stack, must be $FFF9FFFF
$0014: FF FF F9 FF - below local stack, must be $FFF9FFFF
$0018: 35········· - push #0·· (long written to $0010)
$0019: 37 04······ - push #$20 (long written to $0014)
$001B: 35········· - push #0·· (long written to $0018)
$001C: 2C········· - COGINIT(0, $20, 0) - load asm code from $20+ into same COG #0
$001D: -- -- --··· - filler

$0020: XX XX XX XX - 1st long of asm program to be loaded into COG #0
$0024: XX XX XX XX - 2nd long of asm program to be loaded into COG #0
$0028:············ - rest of data

Cliff L. Biffle said...
I'm not interested in the SPIN environment for a couple reasons, the most immediate of which is that I don't keep Windows boxes around.

I'm collecting the information I need to roll my own binaries and program them onto the chip. This'll mean enough SPIN to bootstrap the assembler code, and no more. In this case, the assembler can come up, jump into a routine in high RAM, and clear the lower chunk from there, possibly paging in more code in the process.

In the longer term, I hope to target a compiler to the chip, but we'll see. (I've already got an assembler, from the data Chip sent yesterday, but of course my "binaries" lack the necessary bootstrap preamble.)

Does this assembler not let you use directives to control where your code is loaded in RAM? Unfortunate. That's a pretty standard tool.

In Spin, memory assignments are all done at compile time, and there is no requisite relationship between global and COG ram. I know what you're getting at, but it doesn't apply·to this context.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 6/16/2006 6:33:53 AM GMT

Paul Baker · 2006-06-16 14:33

Cliff L. Biffle said...
Naw, depending on where the SPIN interpreter starts execution, the bottom end of RAM is likely occupied initially.

Besides, it'd have to be VAR LONG res[noparse][[/noparse]128] -- longs are byte-addressed, so only 128 of them are addressable by a literal.

But other than that.

Cliff, you are correct: 128 longs, but to make the most out of each hub access you should be transfering longs whenever possible. So if using wrlong on the first 128 longs in hub memory the value added to the wrlong should be 0x0204 since the destination should be even-even aligned (last two bits of address are 0).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Cliff L. Biffle · 2006-06-16 15:53

Chip Gracey said...
In the meantime, if you want to make a bootstrap loader for the Propeller, this is the only preamble you need:

Chip, you rock. You've saved me numerous hours in a hex editor.

The 'vbase' value -- when you say "number of longs loaded," do you mean the longs of the machine code at 0x0020, or the overall word size of the programmed image?

And I should probably ask, since you've been so helpful: do y'all have any objections to third-party tools targeting the Propeller? It'd be non-commercial; my employer doesn't take kindly to commercial side projects.

Also, Paul:

Paul Baker said...
Cliff, you are correct: 128 longs, but to make the most out of each hub access you should be transfering longs whenever possible.

Not necessarily. For example, in my (still simulated) OV6620 interface code, I don't have time to pack the 16-bit samples while reading from the camera. Likewise, packing them into longs before writing to main memory takes more than 8 cycles, so I miss a hub access window.

Net effect: writing using WRWORD takes twice as many writes, but uses the same total cycles.

Paul Baker · 2006-06-16 16:10

I am sure Chip will smile kindly on non-commercial 3rd party tools, he intentionally designed the Propeller in a way to foster an open and collaborative community (no code protection, no compiled only version of·objects, etc).

I see your point on the WRWORD vs WRLONG.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Video Processing (going the other way)

Comments