Please include one sample message from each sensor input. Even without that, though, I can make some observations.
- For starters, please disable all
rejection_threshold
parameters. Those are advanced parameters that should only be used after you have your system working more or less as you want it, and are just looking to squash the odd outlier. - Is the VO data in full 3D? I assume so. If I were you, I would not fuse two absolute pose sources unless you know they will always agree. In this case, your VO data will slowly diverge from the IMU orientation, so I'd just use roll and pitch from one of those sources, and then, if the sensor provides it, fuse velocity data from the other. This is just in reference to your odom-frame EKF.
- In your map frame EKF, you are fusing absolute pose data from your GPS and your odometry source. Assuming your VO data is reported in the odom frame, then every time the EKF receives a measurement, it has to transform that data to the map frame (using the very transform the EKF is creating), and then fuse it absolutely. And then that pose has to agree with your GPS. VO data will drift over time, so those are going to diverge. In this EKF, I would fuse only velocity data from the VO source, if possible.
- All that aside, I would expect this to not behave well, but not produce NaNs. NaNs very often indicate issues with sensor data, like ill-formed covariances. This is why you need to include sample sensor messages.