SIUO

🔔News

🚀[2024-06-12]: Exciting to share our new benchmark about cross-modality safety alignment on Github!🌟

Introduction

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.

Overview

We introduce the Safe Inputs but Unsafe Output (SIUO) benchmark, a novel benchmark meticulously curated to assess the cross-modality safety alignment capability of LVLMs. Covering safety domains across disciplines, including Self-Harm, Dangerous Behavior, Morality, Illegal Activities & Crime, Controversial Topics & Politics, Discrimination & Stereotyping, Religion Beliefs, Information Misinterpretation and Privacy Violation.

SIUO is designed to evaluate three essential dimensions in multimodal models: integration, knowledge, and reasoning. Our goal is to assess how effectively these models can integrate information from various modalities, align with human values through substantial knowledge, and apply ethical reasoning to predict outcomes and ensure user safety. This comprehensive evaluation ensures that models can meet the stringent safety standards required in real-world applications.

Statistics

The SIUO dataset comprises 167 meticulously crafted test cases, with an average word length of 27.2 The SIUO dataset covers nine critical safety domains. To ensure that our examples cover a greater diversity of this safety domain, we also decompose the safety domain into safety subclasses. For example, self-harm is divided into suicide, NSSI, and unhealthy habits. The statistics for the safety domains and safety subclasses are shown in Figure.

SIUO covers 9 safety domains and 33 subcategories.

Leaderboard

We evaluate various LVLMs, including both closed- and open-source models. Our evaluation is conducted under a zero-shot setting to assess the capability of models to generate accurate answers without fine-tuning or few-shot demonstrations on our benchmark. For all models, we use the default prompt provided by each model for multi-choice or open QA, if available.

Open-Source Closed-Source

Reset	Size	Date	Safe	Effective	Safe & Effective	Multi-Choice
GPT-4V(ision)	-	2024-04-29	53.29	69.46	23.35	38.92
GPT-4o	-	2024-05-06	50.90	95.81	46.71	41.32
Gemini 1.5 Pro	-	2024-04-30	52.10	91.62	45.51	47.31
LLaVA-1.6-34B	34B	2024-04-29	40.72	95.81	37.13	52.69
Gemini 1.0 Pro	-	2024-04-29	27.54	92.22	25.15	34.13
LLaVA-1.5-7B	7.2B	2024-04-29	21.56	87.43	16.17	33.53
LLaVA-1.5-13B	13.4B	2024-04-29	22.16	91.62	19.76	32.93
Qwen-VL-7B-Chat	9.6B	2024-04-29	41.32	82.63	29.94	20.96
mPLUG-OWL2	8.2B	2024-04-29	22.16	90.42	17.37	28.14
MiniGPT4-v2	8B	2024-04-29	41.92	81.44	32.93	27.54
CogVLM	17B	2024-04-29	22.75	91.02	20.96	27.54
InstructBLIP2-T5-XL	4B	2024-04-29	8.38	51.50	1.80	-
InstructBLIP2-T5-XXL	12B	2024-04-29	11.98	51.50	4.79	-
InstructBLIP2-7B	8B	2024-04-29	24.55	51.50	4.19	-
InstructBLIP2-13B	14B	2024-04-29	19.76	49.10	4.19	-
Random Choice	-	2024-04-29	-	-	-	24.95

Overall results of different models on the SIUO. The best-performing model in each category is in-bold, and the second best is underlined. Except for multiple-choice questions, all other results are based on manual evaluations.

Error Examples

BibTeX


      @article{wang2024cross,
        title={Cross-Modality Safety Alignment},
        author={Siyin Wang and Xingsong Ye and Qinyuan Cheng and Junwen Duan and Shimin Li and Jinlan Fu and Xipeng Qiu and Xuanjing Huang},
        journal={arXiv preprint arXiv:2406.15279},
        year={2024},
        url={https://arxiv.org/abs/2406.15279},
        archivePrefix={arXiv},
        eprint={2406.15279},
        primaryClass={cs.AI},
      }