Abstract: This paper studies the task of any objects grasping from the known categories by free-form language instructions. This task demands the technique in computer vision, natural language ...