It sounds like you have the 3D point that describes the location of the camera, and since you have the IMU with it you know it's orientation as well. Since you also know the estimated location of the ground plane this can be treated as a ray-plane intersection problem, which is a common problem in computer graphics that solves the x,y,z intersection point of a ray and a plane.
The line that starts from the camera and points through the ground is the ray, and the ground or course is the plane. Since the object you're selecting/detecting from the camera image is not always in the center pixel, you'll need to add a pan and tilt angle to the ray depending on which pixel the center of the object corresponds to, the image_geometry package has tools to help do just that.
As mig mentioned, you'll want to use tf to help keep track of all the transformation frames, and a urdf can make this even easier.
Once you know the ray, google "line-plane intersection". There is even a Wikipedia article about it to get started. You might even get lucky searching for "how to do line-plane intersection in c++" or python or however you want to do it, and you might find some ready to use code.