Regulating Modality Utilization within Multimodal Fusion Networks

dc.contributor.authorSingh, Saurav
dc.contributor.authorSaber, Eli
dc.contributor.authorMarkopoulos, Panos P.
dc.contributor.authorHeard, Jamison
dc.date.accessioned2024-09-27T13:18:44Z
dc.date.available2024-09-27T13:18:44Z
dc.date.issued2024-09-19
dc.date.updated2024-09-27T13:18:45Z
dc.description.abstractMultimodal fusion networks play a pivotal role in leveraging diverse sources of information for enhanced machine learning applications in aerial imagery. However, current approaches often suffer from a bias towards certain modalities, diminishing the potential benefits of multimodal data. This paper addresses this issue by proposing a novel modality utilization-based training method for multimodal fusion networks. The method aims to guide the network&rsquo;s utilization on its input modalities, ensuring a balanced integration of complementary information streams, effectively mitigating the overutilization of dominant modalities. The method is validated on multimodal aerial imagery classification and image segmentation tasks, effectively maintaining modality utilization within <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>&plusmn;</mo><mn>10</mn><mo>%</mo></mrow></semantics></math></inline-formula> of the user-defined target utilization and demonstrating the versatility and efficacy of the proposed method across various applications. Furthermore, the study explores the robustness of the fusion networks against noise in input modalities, a crucial aspect in real-world scenarios. The method showcases better noise robustness by maintaining performance amidst environmental changes affecting different aerial imagery sensing modalities. The network trained with 75.0% EO utilization achieves significantly better accuracy (81.4%) in noisy conditions (noise variance = 0.12) compared to traditional training methods with 99.59% EO utilization (73.7%). Additionally, it maintains an average accuracy of 85.0% across different noise levels, outperforming the traditional method&rsquo;s average accuracy of 81.9%. Overall, the proposed approach presents a significant step towards harnessing the full potential of multimodal data fusion in diverse machine learning applications such as robotics, healthcare, satellite imagery, and defense applications.
dc.description.departmentElectrical and Computer Engineering
dc.description.departmentComputer Science
dc.identifierdoi: 10.3390/s24186054
dc.identifier.citationSensors 24 (18): 6054 (2024)
dc.identifier.urihttps://hdl.handle.net/20.500.12588/6631
dc.titleRegulating Modality Utilization within Multimodal Fusion Networks

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
sensors-24-06054.pdf
Size:
10.65 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.86 KB
Format:
Item-specific license agreed upon to submission
Description: