There's no magic solution my friend. You have to stick through it and see it to the end
To answer the second part:
There's deep neural networks that do object detection. Pylearn2 implements a lot of these deep neural nets with easy to use config files.
There are tutorials on pylearn2: deeplearning.net/software/pylearn2/
I have code implemented that does a HoG SVM 2 layer then classify situation. It doesn't work very well, but it's a start. It's also in ROS, but has not been implemented as a ROS node yet. You can look through this to see how the keypoints work for your above situation.
https://github.com/thunderbots/athome...
Here is how the keypoints are extracted:
https://github.com/thunderbots/athome...
It highly depends on your performance/speed constraints. From my own experience, using keypoints works quite well (SIFT and SURF being the most accurate but slowest). Are you doing keypoints detection, descriptors extraction and keypoints matching with outlier removal (using RANSAC)?
I can only comment that I had satisfying results using HOG features in combination with a SVM classifier.