Amazing Sand Physics demo uses 6 cogs to animate 10,000 grains of sand in realt

Dennis Ferron · 2007-02-25 07:17

Recently I extolled the virtues of a raster-based, non-tiled graphics system, and speculated about modifying the tv.spin object to make it non-tiled, to make it easier to do pixel-by-pixel raster graphics instead of tile-by-tile graphics. Well, I did that, and now here's a demo showing just what a non-tiled game engine can do!

If you run the demo, you will see a continuous "snow" of grains of sand falling from the ceiling, and piling up on the ground. If you watch it long enough, the grains will pile up into sand dunes, which shift and flow into a red sinkhole on the right. The really neat thing is that the interaction of the falling sand, the flowing sand, and the sinkhole is chaotic and unpredictable. Sometimes the sink hole absorbs almost all the sand, but then a little tiny dune will escape the sink hole and grow to enormous size. Then it flows and falls over the sink hole and completely covers it. For a while it looks as though sand is coming in faster than it can leave, but suddenly the whole thing collapses and gets sucked into the hole. Then the process starts again, but with variations every time! The level of sand in the terrarium never goes too high or too low.

And the best part is: all this complexity arises out of just two "rules" that govern the individual grains! I can animate 10,000 grains of sand with just 12K of memory and a small processor because the grains of sand are cellular automata. I got the idea from Conway's Life game. In the game of Life, cells are "alive" or "dead", and there are simple rules which govern invidual cells to create complex overall behavior. After seeing what arises in Conway's Life, I figured cellular automata would work for doing the physics for particle interactions, too, and guess what, it does!

There are only two basic rules which operate on the sand particles:

1. A sand particle will fall if there is an empty space beneath it.
2. If 3 or more particles are stacked up, the bottom particle will be pushed left or right out of the tower, if there is space.

This is the sand physics engine I'm developing for my ant farm game. I'll need some more rules to allow tunnels, but even as it is now it would be good for something like a scorched-earth game (two tanks shoot artillery at each other; the sand would make great craters where the shells hit).

Another good feature of cellular automata is that they are trivially parallelizable - you can assign as many or as few processors as you want to the task without changing your code. This is a perfect fit for the Propeller - as I demonstrate by assigning 6 cogs to run the sand engine at once. The grains of sand are so tiny and there are so many, if two cogs step on each other's toes and accidently erase a grain of sand or duplicate it, you'd never even notice anything.

Unfortunately it appears that after a while the 6 cogs *do* interfere with one another, as evidenced by the demo getting slower and slower over time until it is running at the same speed as it did with one cog. The algorithm doesn't care if two cogs try to read/write the same grains of sand at the same time, but maybe the Propeller hardware doesn't like it. When the demo is started, I use a waitcnt between starting each physics cog to try to keep the cogs looking at different zones of the screen so that they won't interfere with each other. Eventually one of the cogs will catch up with another one and *either* it is causing the cog to crash until only 1 is left, or the cogs are getting stuck in lock-step waiting for each other because of read and write locks. I'm not really sure how to fix this.

LoopyByteloose · 2007-02-25 08:21

Sounds like this should be a Parallax Demo application for shows and the demo board.

I have been wondering for quite sometime if a pixel specific video was ever going to be creatively used. [noparse][[/noparse]Though there are vector graphics, like Asteroids]
I am downloading right now and taking a look as you certainly got my attention.

Nonetheless, tiles really offer other convienences. They allow alpha-numeric data to handled with a minimum of hardware and they are efficent for the creation of real-time games.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"If you want more fiber, eat the package.· Not enough?· Eat the manual."········

···················· Tropical regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan

Dennis Ferron · 2007-02-25 17:54

Certainly. Tiles are good for some types of things; pixel-specific graphics are good for other, different things. There are clear advantages to a tiled system for most applications of the Propeller's graphics capabilities, but AFAIK there are not many demos of non-tiled graphics. Honestly I could have done this demo using the original Parallax-provided tiled graphics objects, and it would have run almost (but not quite) as fast, but it would have been more complicated to program for.

Just wait until I get Mode-X emulation working, then I can show how it's possible for a non-tiled system to do smooth scrolling yet use only 1 video page (no double buffering)!

Bergamot · 2007-02-25 18:48

You know, I'll come right out and say that I don't think that Cellular Automata are inherently parallelizable.

There might be rare cases where they are, but in general the potential interdependence of each cell on each other is going to cause memory issues like the one you noticed; there's simply no easy way around it.

Now what you could do is to split the simulation into six "buckets" where sand could fall, but was guaranteed not to touch sand from another bucket. Then you could store sand info in cog memory (you'd need some pretty tight packing to get it all in, but i think it's doable). I have a feeling that's not what you're after, though.

It is a cool looking demo though.

I did one of my senior projects on the use of cellular automata to predict racial segregation; maybe I should port those sims over to the prop.

Post Edited (Bergamot) : 2/25/2007 6:55:26 PM GMT

Dennis Ferron · 2007-02-25 19:21

Ok well I don't want to misuse terms so I'll just say that "in layman's terms" this particular algorithm is easy to run in parallel because each "cell" only looks at the cells very close to it, so you can run multiple cogs on it as long as they are more than 4 or 5 pixels away from each other as they work. So each cog works from bottom to top, line by line, and it's ok as long as all the cogs move from bottom to top at the same speed and maintain some separation between them. Problems ensue if a cog gets bogged down and a cog "below" it catches up to the line it's on. The thing with dividing the screen into buckets is that it's difficult to handle the edge cases where two buckets touch, and if you ignore the boundaries and just let cogs move sand in and out of each other's buckets, then there is still the possibility that another cog will be writing sand into your bucket area while you are trying to put a grain in the same spot. So instead of buckets I thought I'd just have all the cogs scan the whole screen, but dynamically insert waits after each line so that the cogs all remain exactly 1/6 of the screen away from each other as they scan. If one cog is getting a little bogged down (happens if there is a lot of sand) the others would all have to slow down too, to maintain an equal distance. There are still caveats with that; for instance a hole at the bottom of a sand dune can only "bubble up" to the top as fast as the scan speed of a single cog doing the sim no matter how many other cogs are used. Having more cogs doesn't make holes bubble up any faster, but it allows more holes to bubble up at once.

Bergamot · 2007-02-25 23:16

I still think you're going to be limited by the fact that the cogs cannot write to hub memory at the same time; they have to wait their turn, even if they're trying to write to different places.

Dennis Ferron · 2007-02-26 05:10

Here's a version of the demo that achieves smooth scrolling using only 1 video page, using a "mode x" method. It is the same as the original sand demo, except that now the whole background (including moving sand) scrolls in a continuous loop. It doesn't require any more CPU resources to do this over the nonscrolling version, because it doesn't require any memory moves (block-image transfers) to scroll.

Instead, I further modified the TV driver to accept an "x origin" that it uses to determine where to start each line as it scans. I then over-allocated the screen memory so that there are 64 more pixels per line than can be displayed on the screen. In other words, the display is a 256x192 rectangle, but this logical window actually resides in a physical rectangle in memory that is 320x192 pixels. There is now a wide strip of pixels that is present down the side of the large rectangle in memory that can be drawn on just like the rest of the display, but is not actually visible on the TV. You can then change the x origin to slide the logical 256x192 rectangle (what the TV is showing), left or right within the wider rectangle in memory. Scrolling is as simple as writing to just one parameter, changing the x offset the TV driver uses to start each scan. Since the TV driver only loads this parameter at the beginning of each frame, there is no tearing or shearing - this graphics scheme achieves all the benefits of double bufferring, without the RAM overhead of a second video page. You also don't need to wait for a v-sync to avoid shearing; you just change the parameter whenever you want to and the TV picks it up the next time it starts a new frame.

Obviously you can scroll left or right 64 pixels this way, but what happens when you scroll past the edge of the extra strip I added? Well I also made the TV driver automatically wrap back to the beginning if you go past the end of the line, so you can scroll past the edge of the rectangle and it will wrap around to the beginning. In this example it allows the demo to repeat the same background over and over, but in a game you could use this to scroll different things into view. You would simply write new scenery onto the strip of the video memory that is offscreen, and there would be no visual artifacts from that because it is not visible yet. As you scroll forward, that strip that was previously offscreen will come into view. While that strip is coming into view, old pixels are scrolling off screen. You can then overwrite the old pixels (which now can't be seen) with more new scenery, and so on. Since the display wraps around like a mobius strip, you never run out of room to scroll, and you never have to draw the whole screen at once, just tiny strips of it.

Tracy Allen · 2007-02-26 17:54

That's better than a lava lamp! Could the individual cogs randomize the x-y position where they work on each step, instead of working down line by line?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Lawson · 2007-02-26 23:30

From glancing at the code I'd guess that the "sand physics" cogs are catching each other and getting stuck in lock step. I bet this could be solved by adding a synchronization method to the code. Say have a "done with line" variable in hub memory. Initialize it to 0 and have each cog increment it by one then wait till the "done with line" variable hits six? (hm... it'd need a lock to insure only one cog at a time does the "read increment write" operation.) Once the variable hits six the cogs continue to the next line and clear the variable after some small delay to allow time for every cog to see the final value of the variable.

my two bits
marty

DogP · 2007-02-27 00:02

Awesome! I was looking through the TV driver and started experimenting/modifying it to do non-tiled, but it looks like you just saved me the trouble... plus a demo to show how it works [noparse]:)[/noparse] .

Pat

Dennis Ferron · 2007-02-27 01:56

Wow guys thanks for the good words. DogP, you're welcome to re-use the modded TV driver. However, I want to warn you that since I had problems with the color palette, I hard-coded the colors into the TV driver for the time being - you can pass a color palette in but it ignores it. I had a bug in my code for getting the values I wanted out of the color table in the TV driver, and instead of fixing it I put a band-aid over the problem by making the palette a constant that you will find in the TV object. If you want to use a hard-coded palette, you can change the constant; if you want to be able to dynamically change the palette, it could be fixed, I just didn't spend the time then to find a solution.

Lawson: I tend to agree, but I'm going to write some code to test this theory - I want to see if multiple cogs do in fact slow down when writing the same memory. If that is what is slowing it down, I think I can fix it, but I want to be sure that's what the problem is.

Edit: In the long run it might be better to just try to make a single sand physics demo simply run faster on just one cog (optimize the code).

Edit: Yes, they could work off a random x-y position. It might look a little strange though, because sand on different parts of the screen might fall at different rates if they randomly get more "hits". With the whole-screen scan, everything gets equal consideration.

Post Edited (Dennis Ferron) : 2/27/2007 2:01:08 AM GMT

Dennis Ferron · 2007-02-27 03:32

It turns out that contention for memory write access is NOT the cause of the slowdown in my sand physics. I believe the test program I've attached proves that cogs do not get caught in "lock step" if they try to read/write the same memory. I initially fell for believing that they did, but then I began to wonder about the validity of that assumption, since the Propeller is supposed to have deterministic timing. It turns out that the timing is indeed deterministic - reads and writes to hub memory take the same amount of time no matter whether other cogs are writing to it or not, and there are no "automatic write locks". Such assumptions are simply hold-overs from being used to other computing concepts such as PC multithreading or SQL database locks; the Propeller doesn't work like old systems and it's bad to carry too much baggage with you when trying to understand how it does work. Each cog simply gets a turn to access memory in a round robin fashion and it makes no difference even if other cogs are writing to the same memory location.

The way the test program works is that it runs an assembly language program that increments two counters. One counter is private to the cog and records how many times the cog's program loop executed. The other points to a shared counter whose value is not important, but which is read to and written from to demonstrate "access contention" - all the cogs running the test try to simultaneously read and write this shared counter at once. Just to be doubly sure that there will be a "collision" will occur, each loop reads and writes the same data to the shared memory location dozens of times in a row, so that the loop spends most of it's time banging on the shared memory, and if a collision could possibly occur, it should happen often.

It runs 3 copies of the assembly language program in 3 different cogs, with 3 private counters and one shared counter. The Spin program running on a fourth cog (the original cog) takes a snapshot of the three counters, uses the hardware counter (waitcnt) to wait for a period of time, and then takes another snapshot. This gives you a delta value telling you how many times each cog looped during the pause. Now, if the cogs were having to wait upon one another for read or write access to the shared location, then with three of them contending they would have to wait longer for access. So you would expect that when you run three of them together, your delta values (differences in loop count) would be lower than one cog running by itself at full speed with no contention. This is not what you see.

On a Propeller running at an 80MHz clock speed, you should see a delta value of about 3E9C to 3E9D for all three cogs. If you comment out one cog-start line and run the test with just 2 cogs, it is still 3E9C to 3E9D. If you comment out two cog-start lines and run the test with just one cog, it is still 3E9C to 3E9D. The loop speed in the test is wholly unaffected by the number of cogs that are writing and reading the same shared memory location. Therefore, it is impossible that my sand-physics cogs are slowing each other down due to write contention - frankly, write contentions do not exist on the Propeller, due to its spiffy architecture.

That still leaves me in the dark about the actual cause of the slowdown on the sand demo. Given the results of this test, it seems unlikely that cog to cog interaction has anything to do with it.

DogP · 2007-02-27 16:04

Heh, thanks for the heads up on the color palette... it's not a big deal for me at the moment since getting the data I want into the buffer is gonna be the hard part, but eventually being able to change colors would be good [noparse]:)[/noparse] .

Pat

mpark · 2007-06-14 00:49

Newbie question here: What do I need to do to make the sand demo work on a Hydra?

potatohead · 2007-06-14 01:21

You need to change the video output pin group and the clock PLL mode. I think that's it. Check out this thread for the details, should be no more than a few edits:

http://forums.parallax.com/forums/default.aspx?f=33&m=185675

Bob the Builder on a C64 · 2007-06-14 02:42

The demo doesn't look right on the Hydra, 10,000 grains don't fall when you translate it that way, instead it falls like sand, not in grains, just in lines. Did this happen to any of you?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Realize that I am really a mad scientist··· and

Don't forget it!

http://raydillon.com/Images/Illustration/GameArt/WildIsle/WildIsle-Ink-ScientistClose.jpg

·

Rob7 · 2007-06-14 03:15

Cool,
I like the dropping sand effect.
rob7

mpark · 2007-06-14 05:09

Well I got something, but as Bob says above, it doesn't look right. Of course, I don't know what it's supposed to look like, but for one thing the falling grains leave blue trails and pretty soon the background is practically all blue.

CardboardGuru · 2007-06-14 08:38

Hi Dennis,

This is excellent. I definitely looks like we're interested in the same sort of things. I don't know if you tried out Fantasia or not, but the second version has particles. And a few days ago when I released it, 256 particles looked impressive. Now you've blown it out of the water with 10,000!

Of course we've taken very different approaches. You store your particles as pixels in a frame buffer. Mine are a list of X,Y coordinates of pixels. Good for different things. Yours good for large numbers of particles with celluar interactions between neighbours, just as you've done. Mine for starfields and sparks - no interactions between particles, but indiviidual velocity and acceleration.

I had been thinking about another approach for particles, which again has it's uses and drawbacks. Storing them as run-length encoded, either vertical or horizontal. And that ties into what Baggers is doing with his run-length encoded TV driver.

Post Edited (CardboardGuru) : 6/15/2007 6:24:56 PM GMT

CardboardGuru · 2007-06-14 10:18

Hi Dennis,

I don't think any of your cogs are dying, or that you're getting into any clash conditions, or any such thing. I think what you are seeing is that difference in speed of various paths through the sand-physics code. When the demo starts hardly any of the screen is sand. So that path through the code is really short. Then, falling sand or sand on the top level of the dunes is faster than buried sand because the IsTower subroutine has a loop in it. A really small loop, but it's multiplied thousands of times.

I've come to this conclusion by doing a quick blinkenlights hack. Changes are described below. This gives a column of pixel-pairs at the top of the screen, one for each cog. They cycle through the 4 colours, once per frame. The fact that they all keep flashing means all the cogs are still alive. And they keep up their same pattern, one colour change following the other, but just gradually slowing down as more sand appears.

See what you think.

C:\Hydra\downloads\SandDemo\SandDemo.spin
Change
< gy := 10
---
> gy := 20

C:\Hydra\downloads\SandDemo\sand_physics.spin
Add the following after the line: mov sandy, #(ScreenH - 1)
> cogid py
> sub py,#1
> mov px,#16
> call #GetPixel
> add pixelcolor,#1
> and pixelcolor,#$3
> call #SetPixel
> add px,#1
> call #SetPixel

C:\Hydra\downloads\SandDemo\sand_physics.spin
Change:
< djnz sandy, #:loop_y
---
> sub sandy,#1
> cmp sandy,#16 wz,wc
> if_a jmp #:loop_y

Post Edited (CardboardGuru) : 6/15/2007 6:25:20 PM GMT

Lawson · 2007-06-14 14:38

yea, CardboardGuru is going over in more detail what I mentioned farther up the thread.

"From glancing at the code I'd guess that the "sand physics" cogs are catching each other and getting stuck in lock step."

I was thinking of the high level logic when I said that. I assumed that the time to complete a scan line depended on the ammount of sand on that line. If that was the case, a physics cog could get bogged down enough for the cog behind it to catch it. Now, since (as you've proven) each cog in the Prop acts 'independently' no hardware failure occured. I think it's simply that the two cogs now read the same data in one hub cycle, take the same time processing the data, and then write out the same results in another hub cycle. Hence they're now operating in "lock step" and it looks like one cog crashed.

add a re-synchronization system or make the physics cogs repel each other and I think this'll go away.

Sorry I wasn't clearer before,
Marty

CardboardGuru · 2007-06-14 16:45

Hi Marty,

I don't see why they would "get stuck in lockstep", even if one somehow managed to catch up with another. There are no locks used. And no waiting loops. I imagine they all just get slower as the sand builds up at the bottom of the screen.

On the other hand, it's probably worth working out a scheme to keep the cogs more or less evenly spaced on the screen to minimise the jerkiness of the animation.

Lawson · 2007-06-15 09:24

aargh, it's quite simple really. when two cogs are processing the SAME grain of sand, they start with the same data. Since they start with the same data, each cog follows an identical path through the code. Since both cogs run the exact same instructions on the exact same data, they take the exact same number of clock cycles to finish. After all this both cogs then write the result to Hub ram. Since both cogs write the SAME result at the 'same' time (between 2 and 14 clocks apart really) It now looks like one cog has died. No locks are needed. No special waiting loops. Simply having the same code and working on identical data is enough for perfect synchronization.

ever heard of Fireflies synchronizing? Certain species of fireflies will spontaneously synchronize, this results in whole trees blinking on and off at the same time. What I'm describing is the same effect, it just has different underlying causes. www.jcu.edu/math/faculty/dpalmer/Animations.html is the best link I could find with Google. Unfortunately the Java animation's won't load for me so don't treat it as the final word. (lots of nice cryptic technical info found with Google too)

CardboardGuru · 2007-06-15 10:12

But the blinkenlights perceptibly change colour in a progression, not flash together.

It's pretty easy to test. Let's see if the color of the pixel changes inbetween each cog reading it and writing it.
Add the following lines at the top of the loop, after px and py is initialised.

call #GetPixel
mov steve,pixelcolor

Then these at the end of the loop before movex and movey are added in:

call #GetPixel
cmp steve,pixelcolor wz,wc
if_ne mov dira,#1
if_ne mov outa,#1 'lights the HYDRA debug LED

After a short while the LED comes on. But wait a second, because the DEMO file is also writing to the screen. Those red blocks.
Comment out the following:

' Create a barrier
repeat gx from 80 to 100
repeat gy from 100 to 102
SetPixel(gx, gy, 2)

' Create a sink-hole
repeat gx from 200 to 207
repeat gy from 180 to 188
SetPixel(gx, gy, 2)

Now a couple of things happen. First of all the LED doesn't come on. So at no point are the cogs processing the same pixel. Secondly, without the sink hole, the sand keeps building up, and the processing gets slower and slower, and again it's easiest to see that with the blinkenlights.

Post Edited (CardboardGuru) : 6/15/2007 10:40:59 AM GMT

Dennis Ferron · 2007-06-15 18:11

Wow thanks guys. I was away on a week long trip when this thread fired back up, else I would have replied sooner.

I'm aware of the phenomenon of fireflies synchronizing; that is indeed an ingenious hypothesis for the cause of this situation. That is a very interesting observation that if they ever got to the same place, all the cogs would follow the same paths through the code and never break out of step. So it'd appear as though no work is being done by the other cogs. If you are correct, then we may have stumbled upon a very important discovery about parallel computation. (It seems to me that if this kind of thing can actually happen, it's a kind of problem that could be generalized & be very important to know about.)

However, Occam's razor (the simplest explanation is the most likely to be true) leads me to think maybe it is just that there is more sand than before. I can test this second hypothesis by simply running one cog with more sand and with less sand, and seeing what the speed difference is. OTOH, I can test the lock-step hypothesis by kicking the cogs out of step.

I'm not at home right now so I haven't had a chance to try Cardboardguru's blinken lights, to verify whether it does disprove that the cogs are in lock step.

BTW Cardboardguru, I'm not Brian, I'm Dennis. [noparse]:)[/noparse]

To the person who tried the demo on the Hydra, the blue streaks are not a bug; I did that deliberately. I don't have to do it that way, but I just thought it looked more interesting if the first bits of sand changed the colors of the sky from black to blue. The eventual intention is to have the background scroll and continuously generate new dunes, with new black sky. So instead of filling the whole sky with blue, the sky would be black and blue streaks all the time as you moved forward. However, it's possible the TV driver doesn't look quite right on the Hydra, so maybe what you're seeing isn't identical to what I have. I'll have to try it on my Hydra when I get home and see.

CardboardGuru · 2007-06-15 18:25

Sorry Dennis, dunno where I got Brian from. Some kind of multiprocessing bug In my head I guess.

Chris Merck · 2007-06-20 00:05

Is there a VGA version? My TV is, er..., non-existant!

If not, I will see what it takes to get a "raster-vga".

Dennis Ferron · 2007-06-22 05:21

Well I wouldn't recommend trying to convert the existing tv-mode-x driver to vga; it would be a lot of work. Instead, I would take the VGA driver, and convert it to use the same flat buffer format as tv-mode-x. My flat screen won't display resolutions lower than 800x600, so I haven't bothered with much experimentation with the VGA stuff myself.

Amazing Sand Physics demo uses 6 cogs to animate 10,000 grains of sand in realt

Comments