Shop OBEX P1 Docs P2 Docs Learn Events
Prop Vision Outline Recognition — Parallax Forums

Prop Vision Outline Recognition

HumanoidoHumanoido Posts: 5,770
edited 2011-05-29 18:51 in Propeller 1
I need to have a prop see a rough image (rectangle, circle, square, or irregular shape) and compare the outline with that of a simple stored outline and get a percentage of match.

Ideally it would be a simple vision recognition program and operate in principle much like the prop's speech recognition program, i.e. store around 3 to 5 shapes for recognition.

Has anyone developed this, and if not, what resources are available for the prop to accomplish it?

Comments

  • Duane DegnDuane Degn Posts: 10,588
    edited 2011-05-26 11:18
    Humanoido,

    I think this is a great application for Hanno's machine vision method.

    I personally think I took the easier way with a breakout board for the ADC chip. I know you've seen the project I'm working on. I'm not sure what kind of filters I'll use with my system yet. I know I'll at least want to be able to find the brightest pixels in the image. I might try some shape recognition too.

    I don't remember if Hanno talks about using filters (software algorithms) in the tutorial I linked to above or if the filter information is just in the book The Official Guide ...

    Hanno's method is limited to black and white images. I know several forum members are working on color camera projects. You could also use color filters (plastic film) over the black and white camera's lens to gain color information. (The first color pictures from the moon used three different color filters with a black and white camera.)

    Duane
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-05-26 11:42
    Here's how I would attack the problem of shape recognition. You need to obtain "signatures" for each example and unknown shape. Try this:
    1. Find the centroid of the shape.

    2. For each of, say, 64 equally-spaced angles spanning a full 360 degrees, measure and record the distance from the centroid to the first edge encountered at the selected angle. This array of distances is the object's signature.

    3. To compare two objects for a match, slide their signatures against each other, one step at a time, wrapping to the beginning when you go past the end. This forms a set of number pairs against which a Pearson correlation coefficient can be computed. Do this for each of the 64 positions in the array and record the largest correlation value. This becomes the degree of match for the pair.

    4. Perform step 3 between the unknown and all the template signatures, and pick the one with the highest degree of match, assuming it's high enough.

    By using this technique, you should be able to identify shapes that are congruent to your examples, regardless of their scale or rotation.

    -Phil
  • jazzedjazzed Posts: 11,803
    edited 2011-05-26 12:01
    Being able to match objects is a strength of neural networks. In the Parallel Distributed Processing book series (JL McClelland and DE Rumelhart), optical character recognition for example was a primary application of a neural network. Maybe you should add a post about that in your Big thread.
  • RaymanRayman Posts: 14,877
    edited 2011-05-26 12:12
    A long time ago, I played around with 2-D matched spatial filters. I think they'd do what you want, but I'm not sure that the floating point ability of the Prop is fast enough for real-time analysis...
  • Duane DegnDuane Degn Posts: 10,588
    edited 2011-05-26 12:17
    I think it's interesting that us humans can see low resolution images better with some movement.

    In the project thread I linked to in my earlier post, I have the word "HI" (as seen by the video camera) displayed on my LED array. It is much easier to tell what the image is with some movement. I wonder if there is a way to adapt the algorithm Phil mentions to use multiple images over time. As I think of the problem, the extra time dimension would make a program much more difficult to write but I wonder it adding multiple frames would help a machine see better as multiple frames helps us humans see better.

    Duane
  • jazzedjazzed Posts: 11,803
    edited 2011-05-26 12:24
    I did an application once for finding a outlines comparing sequential frames to get a "score" per x,y cell in a Cartesian plane using simple statistics such as averages and standard deviations. It was satisfying.
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-26 23:44
    jazzed wrote: »
    I did an application once for finding a outlines comparing sequential frames to get a "score" per x,y cell in a Cartesian plane using simple statistics such as averages and standard deviations. It was satisfying.
    Was this project using a Prop chip, and what was the input device?
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-27 00:05
    Here's how I would attack the problem of shape recognition. You need to obtain "signatures" for each example and unknown shape. Try this:
    1. Find the centroid of the shape.
    2. For each of, say, 64 equally-spaced angles spanning a full 360 degrees, measure and record the distance from the centroid to the first edge encountered at the selected angle. This array of distances is the object's signature.
    3. To compare two objects for a match, slide their signatures against each other, one step at a time, wrapping to the beginning when you go past the end. This forms a set of number pairs against which a Pearson correlation coefficient can be computed. Do this for each of the 64 positions in the array and record the largest correlation value. This becomes the degree of match for the pair.
    4. Perform step 3 between the unknown and all the template signatures, and pick the one with the highest degree of match, assuming it's high enough.

    By using this technique, you should be able to identify shapes that are congruent to your examples, regardless of their scale or rotation.l

    Phil, this could supplement speech synthesis and speech recognition. Simple object recognition would be so useful on the Propeller chip. How would you recommend getting the image scan?

    I wish there was a low cost common item that could be constructed for image input to the prop. I thought about using common photodetectors but those are physically large in greater numbers. Some people have used LEDs as detectors but their light sensitivity falls off.

    It would be spectacular to have the next Propeller Demo Board include a small speaker for speech, a mic for speech input, and a small image sensor/lens for object recognition.
  • kwinnkwinn Posts: 8,697
    edited 2011-05-27 00:17
    Start by getting an outline of the image. The simplest way to do that is to xor an image with a shifted copy of itself. For an image of X by Y set pixel(0,0) to pixel(0,0) xor pixel(1,1), set pixel(0,1) to pixel(0,1) xor pixel(1,2), etc. This way you end up with a high contrast outline of regions where intensity/color changes.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-05-27 00:25
    What kwinn is suggesting is a particular instance of a high-pass (convolutional) spatial filter. Here's a tutorial:

    -Phil
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-05-27 05:15
    I did some work using the neocognitron a few years back. It is modelled on the real visual system and has several layers with cells in the earlier layers extracting features like angles and curves, and cells in subsequent layers detecting patterns of these. I was using it for speech recognition for bionic ears and it seemed to do better in noisy environments than other methods. The neocognitron is used for character recognition and its strength is tolerating deformity. One quite powerful demonstration is the ability to detect a character that has been rotated so it is upside down. Each layer tolerates a bit of deformity. Explanation in more detail is here http://www.scholarpedia.org/article/Neocognitron

    Back in the day I was doing this on a Z80 and because it is a neural network, it very much lends itself to parallel processing. I've written neocognitron software for visual basic and even in an excell spreadsheet. I reckon it would do well on a multiprop network.
  • RaymanRayman Posts: 14,877
    edited 2011-05-27 07:49
    Humanoido,

    Check out the paper at this link (takes a while to load):
    http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1575&context=engpapers

    (I happen to be working on a SSD1928 based camera solution)
  • jazzedjazzed Posts: 11,803
    edited 2011-05-27 17:09
    Humanoido wrote: »
    Was this project using a Prop chip, and what was the input device?
    The device was a simple USB camera on a PC with apps written in Java. Propeller wasn't released by then.

    Ever seen a Charles Schwab commercial where the actors appear partially animated? I was able to automatically generate the effect in a small way using a frame grabber and statistics as described. Processing speed was a limiting issue.
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2011-05-27 23:13
    Humanoido,

    Do you have an example 'image' of what your robot might be looking at? That might help to decide on the correct filter. An idea I have in mind uses binocular vision using a single camera.
  • WhitWhit Posts: 4,191
    edited 2011-05-28 06:04
    Per Duane's comment have you seen this demo of Hanno's View Port? This isn't your problem, but demostrates a possible way of getting at the problem.

    http://www.youtube.com/watch?v=yLzXThxZStE&feature=player_embedded
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-28 08:18
    Humanoido, Do you have an example 'image' of what your robot might be looking at? That might help to decide on the correct filter. An idea I have in mind uses binocular vision using a single camera.

    The robot is the Big Brain and it would look at itself. With this post
    http://forums.parallax.com/showthread.php?124495-Fill-the-Big-Brain&p=1002718&viewfull=1#post1002718
    it was concluded the Big Brain could be made self aware by viewing and recognizing itself in a mirror.

    attachment.php?attachmentid=77628&d=1296030323
    Silhouette of the Big Brain shows the basic rectangular shape for recognition

    Because the Big Brain is in the shape of a tall narrow rectangle, I think recognizing this basic geometrical shape is a priority for self recognition. Simple monocular vision at 1X would be sufficient or a lesser focal length lens would work too.

    I'm still looking for a very simple low resolution way to get an image into the prop maybe 64 or 128 photo sites. It would help if the home - made camera could be built from common parts like small phototransistors offered by Parallax, so students could duplicate the camera and do vision recognition too.
  • RaymanRayman Posts: 14,877
    edited 2011-05-28 08:25
    I think a cheap NTSC camera and Perry's "Stupid video capture" is a nice way to get a low res, B&W photo into the Prop...
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-28 08:27
    Whit wrote: »
    Per Duane's comment have you seen this demo of Hanno's View Port? This isn't your problem, but demostrates a possible way of getting at the problem.
    http://www.youtube.com/watch?v=yLzXThxZStE&feature=player_embedded

    You're right - Hanno's work is really fantastic and serves as a remarkable model of what one can accomplish. What I'm thinking about is open source Spin programming and a system that students can understand along with building a home-made camera using simple components so anyone can make their own vision recognition prop system.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-05-28 09:40
    Humanoido,

    I think you trivialize "self-recognition." How, for example, would your Big Brain tell the difference between itself and other big brains of the same genre? It's not as if the image of a tall rectangle on the Big Brain's retina will suddenly instill consciousness, as with apes touching the monolith in 2001. That only happens in Hollywood. Computers are amazing, but they aren't magic.

    -Phil
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-28 09:41
    Rayman wrote: »
    I think a cheap NTSC camera and Perry's "Stupid video capture" is a nice way to get a low res, B&W photo into the Prop...
    How cheap is the NTSC camera? Which one? Perry has an amazing video system. Can it work without the SD card?
  • RaymanRayman Posts: 14,877
    edited 2011-05-28 09:46
    There's a $12 camera here:
    http://www.dealextreme.com/p/ntsc-mini-surveillance-av-camera-628x582px-6019

    You don't need SD card...
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-28 09:59
    Humanoido, I think you trivialize "self-recognition." How, for example, would your Big Brain tell the difference between itself and other big brains of the same genre? It's not as if the image of a tall rectangle on the Big Brain's retina will suddenly instill consciousness, as with apes touching the monolith in 2001. That only happens in Hollywood. Computers are amazing, but they aren't magic.

    You are right, I do make it trivial - if it's made ultimately simple it could have the problems you mention of being inaccurate when looking at the 2001 A Space Odyssey monolith (if it has exactly the same dimensions compared to itself, or some other matching dimensional rectangle with the same angular measurement).

    But for something ultimately simple, I mean really, how often does that happen? It's not like we go out every day to discover the monolith sticking up in our front yard, or take our Big Brain to see the classic movies (ok, maybe the latter but only if the Brain requests it..) so I think we could get by with a trivialized system for demonstrative purposes.

    Maybe the Big Brain will have something unique sticking out on the side for greater self recognition (just as some human may have a really big obtuse nose sticking out farther).

    As you once said in reference to machine intelligence and the AI community, it is important to start at the basic level developing fundamentals rather than jump at the top off the bat. The system could always be improved in the future.

    Computers and software are definitely not magic but some people on this forum would have to agree that the results of some of your Propeller programs seem magical.
  • StefanL38StefanL38 Posts: 2,292
    edited 2011-05-29 04:46
    if you only want to recognise big brain itself and nothing else give big brain some IR-LEDs flashing a certain pattern and use an IR-receiver to analyse it
    best regards

    Stefan
  • PerryPerry Posts: 253
    edited 2011-05-29 05:12
    Humanoido wrote: »

    I have been working on a new version with the GreyTV driver and keyboard control instead of IR.

    also doing some research on line/edge extraction algorithms.

    Perry
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-29 07:26
    StefanL38 wrote: »
    if you only want to recognise big brain itself and nothing else give big brain some IR-LEDs flashing a certain pattern and use an IR-receiver to analyse it best regards Stefan
    How well does a common mirror reflect IR light? I know IR can reflect off walls etc.
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-29 07:27
    Perry wrote: »
    I have been working on a new version with the GreyTV driver and keyboard control instead of IR. also doing some research on line/edge extraction algorithms. Perry
    It sounds like you know how to make this system work!
  • HannoHanno Posts: 1,130
    edited 2011-05-29 18:51
    Hi,
    Computer vision typically requires lots of memory and processing power- for humans it requires quite a bit of our brain. It's therefore not a good fit for the Propeller...
    So, start with simple things- by finding the brightest spot you can track a laser pointer. Or, by looking for an artificial pattern that sticks out from the environment. Reliably detecting arbitrary objects is very hard. I'm close to finishing some work on a serial camera, the c329 where I choose to slowly download the raw rgb pixels from the camera- allowing me to compute while the data streams in at ~1fps. If you don't really need a fully embedded system, I highly recommend looking at the integrated OpenCV functionality of ViewPort. Lots of really cool vision research is performed with OpenCV- and that functionality is much more compelling than just finding the brightest spot. Audiences are always wowed by my ViewPort "face finder" demo which steers 2 servos using the Propeller.
    Hanno
Sign In or Register to comment.