I'm looking to mount two cameras on the left and right in parallel on the base of a mobile robot, stitch the images taken from the respective camera topics together and run them through a CNN to determine the pose of objects in a scene. My question relates to how image stitching will affect the outcome of the detection process when attempting to detect object that can only be fully seen with both cameras.

What sort of packages or processes already exist to extract objects from stitched images and how accurate are they?

