Need for a reliable VI initialisation that provides accurate state estimate
Because both the tracking (front end) and BA (back end) fix the states in their optimisations, this can bias the solution –> need for initialisation fix states?: states which aren’t the argument of the optimisation function, i.e. fixed (not optimised)
Optimal solution for initialisation of all the required init. variables [scale, gravity, biases, velocities, structure of the opt. graph, camera pose] would require a full BA,
however this is split into smaller steps
The proposed initialisation is general and applicable to any keyframe-based monocular SLAM
Requirement: any two consecutive keyframes must be close in time (to reduce IMU noise integration)
Process the first few seconds of video with visual monocular SLAM (here using ORBSLAM)
This gets the structure estimate as well as several keyframe poses scaled by an unknown scale
Use a motion that makes all variables observable
Compute gravity bias from orientation of keyframes
Initial guess for scale, accelerometer bias (using known magnitude of gravity 9.81 m/s2)
Refine the scale and gravity direction
Get velocities for all keyframes
When reinitialising after relocalisation (after a long time; using place recognition):
Reinitialise bg gyrometer bias
Scale s and gravity g already known from first initialisation, so no need to calculate anew
Estimate ba accelerometer bias from the same equation used during initialisation (simplified now due to knowledge of s, g)