Using the CMU cam as a robotic eye (Has it been done?)
I heard that the CMU cam navigates by waching the floor, and when a color diference is detected (a object) it will know when to turn.· Can you make the CMU cam work like a human eye?· I was thinking of puting a Ping senser·under it·so it could "See" the object, and know how far it is.··Could it navigate like that, or is that to advanced for the CMU cam?· Just curious.··· -SciTech02
Comments
Post Edited (SciTech02) : 4/5/2006 4:13:33 PM GMT
The point being, they're in different domains, with different purposes, but sometimes they accomplish the same goals. In the submarine, that goal is moving through the water. In the computer, that goal is reaching conclusions and reacting accordingly. This is all triggered by your phrase "really 'seeing' the object". The CMU-Cam can't "see", in the terms that it draws conclusions about what the pattern is on its CCD camera. "Watching the ground for a large color change" IS one way the CMU-Cam interprets its data.
Also, there's lots of people on here with real problems, trying to develop real solutions. Asking a question because you're "just curious" is not likely to get a lot of reponses.
Now, the CMU-Cam is a brilliant concept. It was created by two guys at Carnegie-Mellon University, with the purpose of having a 'line-follower' that could 'look ahead'. It has an on-board SX processor, which runs fast enough that the CMU-Cam can pick out contrasty-regions, and send simple messages to a controlling computer regarding where those contrasty regions are. Given a particular color, it can even control a panning servo to keep a ball (or object) of that color centered in the camera.
And I believe you can 'dump' what is in effect a low-res screen capture of what it is seeing -- but analyzing two consecutive frames is probably way beyond what a BS2 can do. Google CMU-Cam for more information.
Oh, and with a SINGLE CMU-Cam, you'll probably ALSO need a distance determining device.· With TWO CMU-Cams, and a lot of processing horsepower, you MIGHT be able to use parallax to find distance, but it would be non-trivial.
Yes, the CMU-Cam does have a 'frame-grabber' mode. You'd have to look at the docs for it to find its resolution -- I think it's rather low, which reduces the amount of processing you need to do to it. It may be more appropriate for a PC/laptop based solution though -- the BS2 is really limited in horsepower when it comes to that kind of processing. In the 'frame-grabber' mode, the CMU-Cam will output a serial stream of the pixels it most recently captured.
The SX/28 and SX/48, however, definitely DO have the horsepower, if not the memory space.
There was a good series of articles in either SERVO or Nuts &Volts on image processing a short while back, using the power of a PC to do most of the "heavy lifting"...definitely worth a read to get a better feel of some of the basics as they are currently interpreted.
That being said, it is possible to use something like the CMU cam to capture and "interpret" a limited amount of information. Keep in mind that it will generally only be "aware" of a 2-dimensional space. If you add an ultrasonic sensor, you can give your 'bot a crude idea of the 3rd dimension (at least of what it is looking directly at...most of the time...within certain parameters).
The BOE-Bot CMUCam is identical to the "stock" CMUCam in all respects except that it communicates at TTL- instead of RS-232 levels, as far as I know.
Please keep us informed...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Truly Understand the Fundamentals and the Path will be so much easier...
How come you would need more then one cammera to do that?·
Also, I read up about ASIMO.· He uses two cammera eyes to navigate around his enviroment.
And, I was on some site (I can't remember) that was selling one·that they said you can intergrate it to a robotic platform.· It was a big site that sold all types of robot stuff.· I think it was called; (Something) Robotics.· If anyone knows of this site, please show us.·
Here's a link about this field of robotics.· It has links at the botom to other related stuff like the smart cammeras I talked about.
http://en.wikipedia.org/wiki/Machine_vision
-SciTech02.·
Anyway, SciTech: One thing you could do (if you have the allowed space and size) is have a webcam or similar feed the frames to a computer which would then analyze them. It depends, I suppose, on how much detail you need out of the image, and how fast you need the frames analyzed. You may be able to use handheld computer, (or maybe even the propeller when it comes out) if the frames are low-res enough.
The ping sensor under the camera is a perfectly valid idea. I am actually going to do something very similar to that on my next robot. The only thing you have to be aware of is the ping may have a greater cone of detection than the field of view that the camera has, and the ping will read the distance to the closest thing in it's detection cone.
When using vision only systems you need to have two viewpoints to estimate distance by calculating the parallax of a nearby object with respect to a distant object.
The biggest issue is the whole AI aspect that arises from doing these things, its a black magic field with mixed results. If you can provide a very structured environment they can do quite well but take them out of the structured environment and they can start behaving erratically. Most all of the issues arise from deciding whats an object or not. Lets say I have a soccer ball that Im holding over my head as though I was about to do a side line throw in. The bot would see me and the soccer ball, see we are at the same distance and concludes the soccer ball and I are one object. Now if let the ball fall in front of me, the bot sees the motion of the ball, but not me, must realize that what it thought was one object is actually two and act accordingly. Now when I pick the ball up and place it over my head, does the bot go back to thinking only one object is present, or does it still see two? These are the types of decision making that·is easy for a human, but not so easy for a robot.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
6+6=10 (Long live the duodecimal system)
Post Edited (Paul Baker) : 4/7/2006 8:42:39 PM GMT
Post Edited (SciTech02) : 4/7/2006 8:53:21 PM GMT
The edge detection simplifies the image, instead of worrying about the full gamut of colors, you are only concerned about the parts of the picture which are relatively quick transitions of color, the resultant image is grey scale black=no color edges, white= very sharp color transition. The correlation compares the two edge detected frames, if the two cameras are perfectly aligned (they wont be) an object in the very far distance (far field) will be in the exact same place in each frame.
Sub frame correlation takes two regions of each frame and mathmatically slides each image over each other trying to find an alignment of the two frames, the output of the correlator will peak when the two frames are aligned. The index value corresponding to the peak is the distance measurement of the object being examined.
Object detection and recognition comes into play when determining the boundries of the subframe, since you want to be finding the distance of a single object and hence want to choose a subframe which includes as large of a portion of the object while reducing areas of not belonging to the object, because this increases the relaibilty of the distance measurement produced by the correlation method.
There are other ways to trying the same goal, but this is the one I understand.
I omitted another step used within the system. Since it's nigh impossible to align two cameras so that they are pointing pixel by pixel in the same direction, it used a 2D correlator calibration step of a far field image. This created a software alignment of the two cameras, where the frames where translated (moved x,y pixels), before doing any further operations on the frames so they are now in perfect alignment (barring any optical abberations which is a whole other story).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
6+6=10 (Long live the duodecimal system)
Post Edited (Paul Baker) : 4/7/2006 9:13:52 PM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
6+6=10 (Long live the duodecimal system)
Also, all of these processers have an ethernet connection, is that important?
One more thing, the two cams, how far apart should they be?· Should they be next to eachother, or two inches apart?··· -SciTech02.
http://www.pixelsmart.com/ps512-8.html
Or this one...
http://www.datatranslation.com/products_hardware/prod_dt3120.htm
-SciTech02.
Post Edited (SciTech02) : 4/8/2006 1:43:48 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10
Now, here's a recap on how to navigate with·CMU cam(s), am I right?
1:·Digitize the image(s), and send it to a processer.
2: Process the image(s), and use the parallax method to find the distance of an object.
3: Send what it detected to the stamp (BS2), and act on what it "Saw".·
I know how to do steps 1 and 2, but I'm not to sure about step 3.·
Thank you for your help so far.··· -SciTech02.
perhaps you could use some wireless cameras and a couple TV cards that have the ability to snapshot the input via a program (IE it has a development library, I dont know off hand which if any do).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10
Post Edited (Paul Baker) : 4/12/2006 5:26:29 PM GMT
Here's some more detail on how ASIMO navigates with two cams (Sadly it's not a lot)
Using the visual information captured by the·camera(s) mounted in its head, ASIMO can detect the movements of multiple objects, assessing distance and direction. Common applications this feature would serve include the ability to follow the movements of people with its camera, to follow a person, or greet a person when he or she approaches.
ASIMO can recognize the objects and terrain of his environment and act in a way that is safe for both himself and nearby humans. For example, recognizing potential hazards such as stairs, and by stopping and starting to avoid hitting humans or other moving objects.
Maybe I could find a way to put a bus on my robot (Is that even posible?).· Or maybe just go back to the idea of puting a ultrasonic under the cam (But the cone of capture is larger then the image itself).· I could use a lasser rangefinder (But thats expencive).· I read about a DIY rangefinder that you would shoot a lasser pointer at the object, and depending on how big the dot is on a camera, you could find out how far away the object is (But having a lasser pointer on all the time isn't good, and it could hurt someone).·
You have helped me a lot so far, thank you for that.··· -SciTech02.
One thing about this direction I didnt mention is that wireless cameras introduce noise into the picture (unless you buy really expensive units), there would have to be a "de-noising" stage added to the processing. Though perhaps if you get Wi-Fi ethernet cameras, the error detection used for 802.11b would denoise the transmission for you (this would also remove the TV tuner requirement).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10
Post Edited (Paul Baker) : 4/15/2006 4:00:45 PM GMT
There does seem to be a lot of interest in this... I've seen wifi discussions as well as RF links... The wifi system I think would be better for most since many people already have this running in their homes etc.
Finding professional reviewers doing side by side comparision of wireless cameras is harder than the proverbial needle in a haystack. Just be sure to buy from a company with a liberal return policy (thats why I linked to newegg).
You should look for a camera which supports TCP/IP or UDP connections since they will provide the lowest latency of aquiring the image.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
There is always an answer.
There is always a way.
There is always a reason.··· -SciTech02.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10
Break the project down into smaller sub-goals and achieve them individually...then integrate them into the whole. I only recommend this to keep you from dropping the project if you try to do it as one piece and realize that each of the little pieces can be a challenge unto itself.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Truly Understand the Fundamentals and the Path will be so much easier...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10
Also, I am breaking the project down to smaller ones.· First I have to learn how to program the cogs to do what I want them to do, connect the cams to it, test if it processed the images·and so on.· Thank you for your guys help.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
There is always an answer.
There is always a way.
There is always a reason.··· -SciTech02.
Post Edited (SciTech02) : 5/1/2006 9:26:12 PM GMT
Let me delurk a bit before making my resonse. I'm a new member to the group and a newbie to robotics, but I am a graduate student in cognitive psychology and cognitive science·and I have some theoretical background in the link between what computers/robots do·and what humans do.·My focus is in human language comprehension, but I have some knowledge of vision too.
SciTech02 wrote:
·[noparse][[/noparse]snip] "Can you make the CMU cam work like a human eye?" [noparse][[/noparse]snip]
Meaning something like, would it be possible to use 1 (or two) CMU cams to make a robot see the world "like a human"·and then do navigation based on this vision, "like a human".
The issue I have is with the phrase "like a human". This makes the assumption that we actually have a good grasp of how a human does navigation and that it is a reasonable way to make robots navigate.·This is a common assumption even among people with PhDs who do machine vision, but it isn't really·right.
OK, some background on cognitive science & cognitive psychology and why this assumption is a problem.
Back in the dark ages people invented serial computers and then·Turing, Newell, Simon and a bunch of others came along and said, hey we can make them think and in fact maybe human cognition is really equivalent to artificial intelligence in the end. The big assumption here was that human cognition was symbol manipulation: data is represented as a set of abstract symbols that are manipulated via syntactic rules. This is an important assumption for early AI because this is what computers can do really well. So if we can do cognition like this then we can do AI.
AI worked on an assumption that sensing the world was pretty easy and entailed mapping the "stuff in the world" onto a set of symbols in the brain. All the hard stuff was the symobol maniupulation.
In the 1980s David Marr wrote a book entitled _Vision_ that epitomizes the old-style AI approach to machine vision and to how human vision might work. We start with a bunch of pixels in the retina. Using some simple algorithms (that are hardwired into the visual system in humans) we can turn the pixels into lines, and corners and from there into surfaces and objects. Once we see the objects we build a map of the world (what Marr call a 2.5D representation of the world) and reason about it to do things like walk without tripping on the coffee table and catch baseballs. If you are interested in machine vision and how it links to human vision this is really worth reading (although a difficult read).
But we have since realized this view is wrong in some important ways. Progress on symbol manipultation systems moved really rapidly (Chess playing computers better than humans, mathematical theorem provers) progress on the sensory input to the world really stank (Speech recognition that still is pretty bad, machine vision that is still pretty bad, etc...). The hard vs. easy probles were reversed. An important part of human cognition really seems to be the sensory link to the world. Getting the data into a useful form is part of the cognition we do. this view is called embodied cognition, the idea being part of doing human cognition is having human links to the world, for a good book on this see _Mindware_, by Andy Clark, a moderately difficult read.
So embodied cognition has a new set·of assumptions to consider. Some of the relevant ones are:
· 1) the human brain isn't a serial computer is is a parallel & distributed system
· 2) the human brain works via lots of small systems that don't need to know what other systems are up to
· 3) object recognition is not the same as navigation
There is evidence to support all of this, and for our purposes the last point in particular. Once in the brain (the primary visual cortex) visual information splits into two very separate pathways, the dorsal pathway and the ventral pathway. Patients with brain damage (and surgically altered monkeys) allow us to see that these pathways are distinct and serve different functions. The ventral pathway is commonly called the "what" stream. It's job is to tell you what you are looking at (i.e. do object recognition, what machine vision wants to do). The dorsal pathway is the "where" stream and tells you that something is moving (or you are moving towards something). Patients with damage to the ventral stream cannot see objects in the normal sense. If you were to hold up a baseball they could not tell you what it was, and in fact thir perception is of being totally blind. But if you were to toss a baseball at them, they would duck! So, the dorsal stream can do it's thing _without_ object recognition: There's something flying at you, I don't know what it is, but duck! If you've seen Jurassic Park recall when the kids are hiding from the T-Rex. They stand still and the T-Rex can't see them. The T-Rex (and other evolutionarily old visual systems) are like the dorsal path, they don't do object recognition, just motion perception. If you are a T-Rex, you bite anything that moves, because it will be food, you don't really care if it's humans or chimps or other dinosaurs.·Later on, object recognition evolved as a related, but distinct visual system.
So seeing "like a human" isn't one type of seeing. Seeing like a human is task dependent. Are you picking an M&M to eat based on color or are you catching a baseball?
There's more to this issue and there are lots of people with more knowledge than me about how to get a robot to do dorsal or ventral style processing, but I just want you to realize that the "like a human" way of doing things isn't what most people think it is. Feel free to ask (in forum or off-line) for more information of the human parts of this equation.
Thanks for reading.
Breton Bienvenue
·
But I think the idea just discussed would work, treating each pixel (Or group of pixels) as a senser.· Then, get the distance by using the parallax method.· With the X, Y and Z axis covered, it could map out an image and navigate on a higher level by actualy "Seeing" it's enviroment and reacting to it.
Thank you for the information.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
There is always an answer.
There is always a way.
There is always a reason.··· -SciTech02.