Local invariant algorithm applied in downward-looking image registration, usually computes the camera’s pose relative to visual landmarks. Generally, there are three requirements in the process of image registration when using these approaches. First, the algorithm is apt to be influenced by illumination. Second, algorithm should have less computational complexity. Third, the depth information of images needs to be estimated without other sensors. This paper investigates a famous local invariant feature named speeded up robust feature (SURF), and proposes a highspeed and robust image registration and localization algorithm based on it. With supports from feature tracking and pose estimation methods, the proposed algorithm can compute camera poses under different conditions of scale, viewpoint and rotation so as to precisely localize object’s position. At last, the study makes registration experiment by scale invariant feature transform (SIFT), SURF and the proposed algorithm, and designs a method to evaluate their performances. Furthermore, this study makes object retrieval test on remote sensing video. For there is big deformation on remote sensing frames, the registration algorithm absorbs the Kanade-Lucas-Tomasi (KLT) 3-D coplanar calibration feature tracker methods, which can localize interesting targets precisely and efficiently. The experimental results prove that the proposed method has a higher localization speed and lower localization error rate than traditional visual simultaneous localization and mapping (vSLAM) in a period of time.