Transformation of poses between camera frame and base frames
Many variations of these questions seem to have been asked quite a few times, however, I was not able to find and compute the right answer for my problem.
I am trying out a custom picking of an object using a Robotic Arm. I have a separate perception module that detects objects and estimates the pose for grasping. It, however (as expected) is in the camera frame and follows image processing convention of coordinate frames, i.e. with right: +x-axis, forward: +z-axis, down: +y-axis
. From this perception module, I get two values - 3x3
Rotation matrix and 1x3
translation vector.
Example T1
:
Tra: [0.09014122 0.16243269 0.6211668 ]
Rot: [[ 0. 0.03210089 -0.99948463]
[ 0. 0.99948463 0.03210089]
[ 1. -0. 0. ]]
(i.e. I have to grasp at that location and in that orientation)
My robot to camera transform is understandably in the right hand coordinate system. Here is an example of the same, say T2
:
translation:
x: 0.0564581200121
y: 0.318823912978
z: 0.452250135698
rotation:
x: -0.6954818376
y: 0.693982204231
z: -0.13156524004
w: 0.13184954074
Now, to get the pose of the object from the robot, is a simple transformation T2 times T1
. However, T1
follows a different convention from T2.
How would I go about this ? A detailed explanation using this example will be highly appreciated! I am trying to understand from scratch, hence I would prefer to arrive at a transformation matrix on my own to apply to the above ones to get the final pose
Additional note: The robot to camera transformation is using the depth_camera_optical_frame
to base_link
. Should I use this or just depth_camera_frame
?
I have checked out similar issues like 33091, gazebo-4226, and other ones. However, all of them seem to give solutions for 3D points, and I could not find any that is applicable for the pose transformation matrix.