The increasing availability and deployment of imaging sensors operating in multiple spectral bands has led to a large research effort in image fusion, resulting in a plethora of pixel-level image fusion algorithms However, lie cognitive aspects of multisensor image fusion have not received much attention in the development of these methods In this study we investigate how humans interpret visual and infrared images, and we compare the interpretation of these individual image modalities to their fused counterparts, for different image fusion schemes. This was done in an attempt to test to what degree image fusion schemes can enhance human perception of the structural layout and composition of realistic outdoor scenes. We asked human observers to manually segment the details they perceived as most prominent in a set of corresponding visual. infrared and fused images. For each scene. the segmentations of the individual input image modalities were used to derive a joint reference ("gold standard") contour image that represents the visually most salient details from both of these modalities and for that particular scene. The resulting reference images were then used to evaluate the manual segmentations of the fused images, using a precision-recall measure as the evaluation criterion. In this sense, the best fusion method provides the largest number of correctly perceived details (originating from each of the individual modalities that were used as input for the fusion scheme) and the smallest amount of false alarms (fusion artifacts or Illusory details). A comparison with an objective score of subject performance indicates that the reference contour method indeed appears to characterize the performance of observers using the results of the fusion schemes The results show that this evaluation method can provide valuable insight into the way fusion schemes combine perceptually important details from the individual input image modalities Given a reference contour image, the method can potentially be used to design image fusion schemes that are optimally tuned to human visual perception for different applications and scenarios (e.g. environmental or weather conditions) (C) 2009 Elsevier B.V. All rights reserved.