Please check the implementation of ICP. I guess it is implemented to use multi-thread or GPU acceleration. By the way, NDT-D2D registration between two sets of about 3000 Gaussian components with multi-thread processing on Intel I7 CPU takes about 0.2-0.3 sec.