REGION-BASEDMULTIMODALIMAGEFUSIONUSINGICABASES
Nedeljko Cvejic, JohnLewis, DavidBull,NishanCanagarajah
DepartmentofElectrical andElectronicEngineering
UniversityofBristol
MerchantVenturersBuilding,WoodlandRoad,BristolBS8 IUB,UnitedKingdom
ABSTRACT
In this paper, we present a novel region-based multimodal
imagefusionalgorithmintheICA domain. Ituses segmen-
tation to determine the most important regions in the input
images andconsequentlyfusestheICAcoefficients fromthe
given regions. The proposed method exhibits considerably
higherperformancethanthebasic ICAalgorithmand shows
improvementoverotherstate-of-the-artalgorithms.
IndexTerms- Imagefusion,jointsegmentation,region-
basedfusion,Independentcomponentanalysis
1. INTRODUCTION
Rapid advances in the areas ofsensor technology and com-
municationnetworks haveleadto aneedforprocessingthat
canefficientlyfuseinformationfromdifferent sensors into a
singlecomposite signal. Imageandvideofusionisasubarea
ofthemoregeneraltopicofdatafusion, dealing withimage
andvideodata [1]. Multi-sensordataoftenpresents comple-
mentary information abouta scene orobjectofinterest, and
thus imagefusion provides an effective methodforcompar-
ison andanalysis ofsuchdata. There are severalbenefits of
multi-sensor image fusion: wider spatial and temporal cov-
erage, extended range ofoperation, increased robustness of
thesystemperformanceandenhanceddetectionandclassifi-
cationcapabilities.
The image fusion process can be performed at different
levelsofinformationrepresentation: signal,pixel,featureand
symboliclevel. Feature-levelfusionmethodsincluderegion-
based imaged fusion. Here images to be fused are initially
segmentedinto a set ofdistinctiveregions. Various proper-
ties ofthe regions obtainedby segmentation can be used to
determine which features from which images are to be in-
cludedin the fused image. This has advantages overpixel-
basedmethods as more intelligent semantic fusionrules can
be considered based on actual features in the image, rather
thanonsingleorarbitrarygroupsofpixels.
Nikolovetal[1]proposedaclassificationofimagefusion
algorithms into spatial domain and transform domain tech-
This work has been funded by the UK Ministry ofDefence Data and
InformationFusionDefenceTechnology Centre.
niques. Insteadofusingastandardbases system, suchasthe
DFT,themotherwaveletorcosinebasesoftheDCT,onecan
traina setofbasesthataresuitableforaspecifictypeofim-
ages. A training set of image patches, which are acquired
randomlyfromimagesofsimilarcontent,canbeusedtotrain
a setofstatistically independentbases. This is knownas In-
dependentComponentAnalysis (ICA) [2]. Recently, several
algorithmshavebeenproposed[3,4],inwhichICAbasesare
used for transform domain image fusion. In this paper, we
refine the approach using a novel multimodal image fusion
algorithm in the ICA domain. Segmentation is used to de-
termine the most importantregions in the input images and
consequentlythe ICA coefficients areusedto fusethe given
regions.
2. BACKGROUNDREVIEW
In orderto obtain a setofstatistically independentbases for
imagefusion intheICA domain, training is performedwith
apredefined set ofimages. Training images are selected in
suchawaythatthecontentandstatisticalpropertiesare sim-
ilarforthe training images andthe images to be fused. An
inputimage i(x,y) israndomlywindowedusingarectangu-
larwindowwofsizeN x N. Theresultofwindowingis an
"imagepatch"whichisdefinedas [3]:
p(m,n) = w*i(mo -N/2+m,no -N/2+n) (1)
wherem andntakeintegervaluesfromtheinterval [0,N-
1]. Eachimagepatchp(m,n) canberepresentedby alinear
combinationofasetofMbasispatchesbi(m,n):
M
p(T,n) = E vibi(T,n)
i=l
(2)
whereV1,V2,*--,VM standfortheprojections ofthe original
imagepatchonthebasispatch,i.e. vi = (p(m,n),bi(m,n))
A 2D representation ofthe imagepatches can be simplified
to a ID representation, using lexicographic ordering. This
impliesthatanimagepatchp(m,n) isreshapedintoavector
p, mapping all the elements fromthe imagepatchmatrix to
the vector in a row-wise fashion. Decomposition ofimage
1-4244-0481-9/06/$20.00 C2006IEEE 1801 ICIP2006
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 27, 2009 at 07:18 from IEEE Xplore. Restrictions apply.
patchesintoalinearcombinationofbasispatchescanthenbe
expressedasfollows:
p(t) ViM (t)bi = [blb2 [ v(t) 1 (3)
i=l ~~~~VM(t)J
wheretrepresents theimagepatchindex. IfwedenoteB
[blb2...bM] and v(t) =[V1V2...VM]T, then equation (3) re-
ducesto:
p(t) = Bv(t) (4)
v(t) = B- p(t) = Ap(t) (5)
Thus, B= [blb2...bM]T represents anunknownmixingma-
trix (analysis kernel) and A = [aIa2...aM]T the unmixing
matrix (synthesis kernel). This transform projects the ob-
served signalp(t) on a set ofbasis vectors. The aim is to
estimate a finite set of K < N2 basis vectors that will be
capableofcapturing most ofthe inputimageproperties and
structure.
In the first stage ofbasis estimation the Principal Com-
ponentAnalysis (PCA) isusedfordimensionalityreduction.
Thisisobtainedbyeigenvaluedecompositionofthedatacor-
relationmatrix C = E{ppT}. Theeigenvaluesofthecorre-
lationmatrixillustratethesignificanceoftheircorresponding
basis vector. If V is the obtained K x N PCA matrix, the
inputimagepatchesaretransformedby:
z(t) = Vp(t) (6)
After the PCA preprocessing step we select the statistically
independentbasis vectors using the optimisation of the ne-
gentropy. ThefollowingruledefinesaFastICAapproachthat
optimisesnegentropy,asproposedin [2]:
a+ < qa{j(aq z)}-{-q'(atJz)}a4
A A(ATA)-0.5
1* jT I Ek(t)
0 otherwise (11)
Thesegmentationmapsofinputimagesarecombinedtoform
asinglesegmentationmap,usingthelogicalORoperator[3]:
s(t) = OR{si(t),s2(t),...,sT(t)} (12)
After the input images are segmented into active and non-
activeregions, two differentfusionrules areusedforfusion
ofeachgroupofregions[3]. Namely,activeregionsarefused
usingthe"max-abs"rule, whilenon-activeregions arefused
using the "mean" rule. The "max-abs" rule fuses two input
coefficients/vectorsbyselectingtheonewithhigherabsolute
value. Inthe"mean"fusionrulethefusedcoefficient/vectoris
equaltothemeanvalueofthetwoinputcoefficients/vectors.
(8)
whereO(x) =-9G(x)/ix defines the statisticalproperties
G(x) = logp(x) ofthe signals inthetransformdomain [2].
Inourimplementationweused:
G(x) = a T+ (9)
whereaand 3areconstantsandEisasmallconstanttotackle
numericalinstability,inthecasethatx 0 [2].
Aftertheinputimagepatchesp(t)aretransformedtotheir
ICAdomainrepresentationsvk(t),wecanperformimagefu-
sionintheICAdomaininthesamemannerasitisperformed
ine.g. thewaveletdomain. Theequivalentvectorsvk(t)from
eachimagearecombinedintheICAdomaintoobtainanew
image vf(t). The methodthat combines the coefficients in
the ICA domain is called the "fusionrule". After the com-
positeimageVf(t) isconstructedintheICAdomain, wecan
3. PROPOSEDMETHOD
Inthispaperwefocusonthefusionofinfra-red(IR)andvis-
ible images, although methods can be generalized to other
modalities. Because the threshold that determines the "ac-
tivity"ofaregionis setheuristically,theregionsobtainedby
thresholdingoftheICAcoefficientsdonotcorrespondalways
toobjectsintheimagestobefused. Ourexperimentsshowed
thatimportantobjects inthe IR inputimages (e.g. aperson
orasmallerobject)areoftenmaskedbytexturedhigh-energy
backgroundinthevisualimage. Inthiscasetheimportantob-
jectsfromtheIRimagebecomeblurredor,inextremecases,
completelymasked. Therefore, weperformsegmentation in
the spatial domain and then fuse patches from separate re-
gionsseparately. Thisdiffersfromthemethodsin[3,4]where
thefusionwasperformedonamoregeneral,pixellevel.
1802
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 27, 2009 at 07:18 from IEEE Xplore. Restrictions apply.
3.1. Thesegmentationalgorithm
The quality ofthe segmentation algorithmis ofvital impor-
tance to thefusionprocess. An adaptedversionofthecom-
bined morphological-spectral unsupervised image segmen-
tation algorithmis used, which is described in [5], enabling
ittohandlemulti-modalimages. Thealgorithmworksintwo
stages. Thefirststageproducesaninitialsegmentationbyus-
ingbothtexturedandnon-texturedregions. Thedetailcoeffi-
cientsoftheDT-CWTareusedtoprocesstexture. Thegradi-
entfunctionisappliedtoalllevelsandorientationsoftheDT-
CWT coefficients and up-sampled to be combined with the
gradientoftheintensityinformationtogiveaperceptualgra-
dient. The largergradients indicatepossibleedgelocations.
Thewatershedtransformoftheperceptualgradientgives an
initial segmentation. The second stage uses these primitive
regionstoproduceagraphrepresentationoftheimagewhich
isprocessedusingaspectralclusteringtechnique.
The method can use either intensity information or tex-
tural information or both to obtain the segmentation map.
Thisflexibilityisusefulformulti-modalfusionwheresomea
prioriinformationofthesensortypesisknown. Forexample,
IRimagestendtolacktexturalinformationwithmostfeatures
havingasimilarintensityvaluethroughouttheregion. There-
fore, weusedanintensityonlysegmentationmap, as itgives
betterresultsthanatexturebasedsegmentation.
The segmentation can be performedeither separately or
jointly. For separate segmentation, eachoftheinputimages
generatesanindependentsegmentationmapforeachimage.
Si = (7(ii,D1), SN = (7(tN,DN) (13)
where Dn representdetail coefficients ofthe DT-CWT used
in segmentation. Alternatively, informationfrom all images
couldbeusedtoproduceajointsegmentationmap.
Sjoint = (JiI i'tN,D1 ...DN) (14)
In general,jointly segmented images workbetter forfusion
[6]. Thisisbecausethesegmentationmapwillcontainamin-
imum number ofregions to represent all the features in the
scene most efficiently. A problem can occur for separately
segmentedimages,wheredifferentimageshavedifferentfea-
tures or features which appear as slightly different sizes in
differentmodalities. Where regions partially overlap, ifthe
overlappedregion is incorrectly dealt with, artefacts will be
introducedandtheextraregionscreatedtodealwiththeover-
lapwillincreasethetimetakentofusetheimages.
3.2. Calculationofpriorityandfusionrules
Aftertheimagesarejointlysegmenteditisessentialtodeter-
mine the importance ofregions in each ofthe inputimages.
WehavedecidedtousethenormalizedShannonentropyofa
Fig. 1. Segmentation and region selection prior to fusion.
Top: IR inputimage (left), visible inputimage (right). Bot-
tom: Regionsobtainedbyjointsegmentationofinputimages
(left),imagemask: whitefromIR,greyfromvisible(right).
regionasthepriority. Thus,thepriorityP(rt,)isgivenas:
1r F0,V1,(x,y)crtn
(15)
withthe convention 0log(O) 0, where lrtn is the size of
the region rtn in inputimage n andd,(o,1)(x,y) C Dn(o,1)
detailcoefficients oftheDT-CWTusedinsegmentation. Fi-
nally, a mask M is generated that determines which image
eachregionshouldcomefrominthefusedimage. Anexam-
pleoftheIRinputimage,visualinputimage,performedjoint
segmentationandtheimagefusionmaskisgiveninFig. 1.
4. EXPERIMENTALRESULTS
Theproposedimage fusion methodwas tested in the multi-
modal scenario with two inputimages: infrared andvisible.
Inordertomakeacomparisonbetweentheproposedmethod
and the standard ICA method, the images were fusedusing
the approach described in [3]. We compared these results
with a simple averaging method, the ratio pyramidmethod,
theLaplacetransform(LT)andthedual-treecomplexwavelet
transform(DT-CWT)[6]. Inthemultiresolutionmethods(LT,
DT-CWT) a5-leveldecompositionisusedandfusionisper-
formedbyselectingthecoefficientwithamaximumabsolute
value, except for the case ofthe lowest resolution subband
wherethemeanvalueis used. Beforeperformingimagefu-
sion,theICAbasesweretrainedusingasetoftenimageswith
contentcomparabletothetestset. Thenumberofrectangular
patches (N = 8) usedfortraining was 10000,randomlyse-
lectedfromthetraining set. Thelexicographicordering was
appliedto theimagepatches andthenPCAperformed. Fol-
1803
d2(0,1)(x, y)logd2(0,J)(X,Y)n n
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 27, 2009 at 07:18 from IEEE Xplore. Restrictions apply.
Table 1. Performanceoftheimagefusionmethodsmeasured
bystandardfusionmetrics.
Metric Method UN 1812 Trees4917 Octec22
Average 0.866 0.962 0.872
Laplace 0.914 0.969 0.939
Piella DT-CWT 0.912 0.969 0.941Ratio 0.862 0.960 0.876
ICA 0.872 0.962 0.889
R-B ICA 0.921 0.974 0.940
Average 0.347 0.513 0.436
Laplace 0.501 0.599 0.767
Petrovic DT-CWT 0.462 0.600 0.768Ratio 0.413 0.533 0.503
ICA 0.415 0.539 0.613
R-B ICA 0.548 0.636 0.784
Fig. 2. Top: input IR image (left) and input visible image
(right); Secondrow: fusedimageusing averaging (left) and
ratiopyramid(right);Thirdrow: fusedimageusingDT-CWT
(left)andLT(right);Bottomrow: fusedimageusingstandard
ICA(left)andregion-basedICAmethod(right)
lowingthis, the 32mostimportantbases (K = 32) were se-
lected, according to the eigenvalues corresponding to these
bases. Afterthat,theICAupdaterulein (7) was iteratedfor
L = 3(3x3neighbourhood)untilconvergence.
ICAcoefficients arecombinedusingtheprincipledescri-
bed in Section 2 for comparison. The images to be fused
were then segmented, regions and image masks determined
foreachofthemandthenICAfusionperformedonthesere-
gions using the "max-abs" fusion rule. Example input im-
ages and fused outputs are given in Fig. 2. It is clear that
thefusedimageobtainedusingtheproposedalgorithmincor-
porate more detail fromthe visible image together with the
importantobjects fromthe IR image, compared to the stan-
dardICAmethod. ThedatapresentedinTable 1 confirmsthis
conclusion, usingboththePetrovic [7] andthePiellametric
[8]. Theproposedmethod exhibits considerably higherper-
formancethanthebasicICAalgorithmandimprovementover
otherstate-of-the-artalgorithms.
5. REFERENCES
[1] R. BlumandZ. Liu, Multi-sensorImageFusionAndIts
Applications, CRCPress,London,UK,2005.
[2] A.Hyvrinen,J.Karhunen,andE.Oja, IndependentCom-
ponentAnalysis, John Wiley and Sons, London, UK,
2001.
[3] N.MitianoudisandT. Stathaki, "Pixel-basedandregion-
basedimagefusionschemesusingICAbases," Informa-
tionFusion,toappear.
[4] N. Cvejic, D. Bull, andN. Canagarajah, "A novel ICA
domain multimodal image fusion algorithm," in Proc.
SPIE Defense and Security Symposium. to appear, Or-
lando,FL.
[5] R. O'CallaghanandD.Bull, "Combinedmorphological-
spectralunsupervisedimagesegmentation,"IEEETrans-
actionsonImageProcessing,vol. 14,pp.49-62,2005.
[6] J. Lewis, R. O'Callaghan, S. Nikolov, D. Bull, and
N. Canagarajah, "Pixel- andregion-basedimagefusion
withcomplexwavelets," InformationFusion,toappear.
[7] G.PiellaandH.Heijmans, "Anewqualitymetricforim-
agefusion," inProc. IEEEInternational Conferenceon
ImageProcessing.2003,pp. 173-176,Barcelona,Spain.
[8] C. XydeasandV. Petrovic, "Objectivepixel-levelimage
fusionperformancemeasure," inProc. SPIE. 2000, pp.
88-99,Orlando,FL.
1804
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 27, 2009 at 07:18 from IEEE Xplore. Restrictions apply.
*