Logo SIUO

Cross-Modality Safety Alignment


1Fudan University, 2National University of Singapore

†Corresponding Author
geometric reasoning

An example of the SIUO (Safe Inputs but Unsafe Output). The input consists of a safe image and text, but their semantic combination is unsafe. Such inputs can easily prompt LVLMs to generate unsafe responses.

🔔News

🚀[2024-06-12]: Exciting to share our new benchmark about cross-modality safety alignment on Github!🌟

Introduction

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.

Logo SIUO Benchmark

Overview

We introduce the Safe Inputs but Unsafe Output (SIUO) benchmark, a novel benchmark meticulously curated to assess the cross-modality safety alignment capability of LVLMs. Covering safety domains across disciplines, including Self-Harm, Dangerous Behavior, Morality, Illegal Activities & Crime, Controversial Topics & Politics, Discrimination & Stereotyping, Religion Beliefs, Information Misinterpretation and Privacy Violation.

algebraic reasoning

SIUO is designed to evaluate three essential dimensions in multimodal models: integration, knowledge, and reasoning. Our goal is to assess how effectively these models can integrate information from various modalities, align with human values through substantial knowledge, and apply ethical reasoning to predict outcomes and ensure user safety. This comprehensive evaluation ensures that models can meet the stringent safety standards required in real-world applications.

algebraic reasoning

Statistics

The SIUO dataset comprises 167 meticulously crafted test cases, with an average word length of 27.2 The SIUO dataset covers nine critical safety domains. To ensure that our examples cover a greater diversity of this safety domain, we also decompose the safety domain into safety subclasses. For example, self-harm is divided into suicide, NSSI, and unhealthy habits. The statistics for the safety domains and safety subclasses are shown in Figure.

SIUO covers 9 safety domains and 33 subcategories.

Experiment Results

Leaderboard

We evaluate various LVLMs, including both closed- and open-source models. Our evaluation is conducted under a zero-shot setting to assess the capability of models to generate accurate answers without fine-tuning or few-shot demonstrations on our benchmark. For all models, we use the default prompt provided by each model for multi-choice or open QA, if available.

Open-Source Closed-Source
Reset Size Date Safe Effective Safe & Effective Multi-Choice
GPT-4V(ision) - 2024-04-29 53.29 69.46 23.35 38.92
GPT-4o - 2024-05-06 50.90 95.81 46.71 41.32
Gemini 1.5 Pro - 2024-04-30 52.10 91.62 45.51 47.31
LLaVA-1.6-34B 34B 2024-04-29 40.72 95.81 37.13 52.69
Gemini 1.0 Pro - 2024-04-29 27.54 92.22 25.15 34.13
LLaVA-1.5-7B 7.2B 2024-04-29 21.56 87.43 16.17 33.53
LLaVA-1.5-13B 13.4B 2024-04-29 22.16 91.62 19.76 32.93
Qwen-VL-7B-Chat 9.6B 2024-04-29 41.32 82.63 29.94 20.96
mPLUG-OWL2 8.2B 2024-04-29 22.16 90.42 17.37 28.14
MiniGPT4-v2 8B 2024-04-29 41.92 81.44 32.93 27.54
CogVLM 17B 2024-04-29 22.75 91.02 20.96 27.54
InstructBLIP2-T5-XL 4B 2024-04-29 8.38 51.50 1.80 -
InstructBLIP2-T5-XXL 12B 2024-04-29 11.98 51.50 4.79 -
InstructBLIP2-7B 8B 2024-04-29 24.55 51.50 4.19 -
InstructBLIP2-13B 14B 2024-04-29 19.76 49.10 4.19 -
Random Choice - 2024-04-29 - - - 24.95

Overall results of different models on the SIUO. The best-performing model in each category is in-bold, and the second best is underlined. Except for multiple-choice questions, all other results are based on manual evaluations.

Error Examples

BibTeX


      @article{wang2024cross,
        title={Cross-Modality Safety Alignment},
        author={Siyin Wang and Xingsong Ye and Qinyuan Cheng and Junwen Duan and Shimin Li and Jinlan Fu and Xipeng Qiu and Xuanjing Huang},
        journal={arXiv preprint arXiv:2406.15279},
        year={2024},
        url={https://arxiv.org/abs/2406.15279},
        archivePrefix={arXiv},
        eprint={2406.15279},
        primaryClass={cs.AI},
      }