ThaiSafetyBench Leaderboard 🛡️

[ArXiv Paper] [Github] [Hugging Face Dataset 🤗]

ThaiSafetyBench is a safety benchmark tailored to the Thai language and culture.

Model	🥇 Overall ASR ⬇️	👉 Discrimination, Exclusion, Toxicity, Hateful, Offensive ASR ⬇️	👉 Human-Chatbot Interaction Harm ASR ⬇️	👉 Information Hazards ASR ⬇️	👉 Malicious Uses ASR ⬇️	👉 Misinformation Harms ASR ⬇️	👉 Thai Socio-Cultural Harms ASR ⬇️
llama3.1-typhoon2-70b-instruct	10.99	13.94	17.09	16.83	11.48	13.41	19.47

Model	🥇 Overall ASR ⬇️	👉 Discrimination, Exclusion, Toxicity, Hateful, Offensive ASR ⬇️	👉 Human-Chatbot Interaction Harm ASR ⬇️	👉 Information Hazards ASR ⬇️	👉 Malicious Uses ASR ⬇️	👉 Misinformation Harms ASR ⬇️	👉 Thai Socio-Cultural Harms ASR ⬇️
GPT-5	4.43	2.19	5.98	0.55	1.06	7.84	8.97
Claude 4.5 Sonnet	9.75	3.49	17.09	0.66	4.39	13.41	19.47
SeaLLMs-v3-7B	9.83	8.96	15.38	0.22	6.8	10.63	16.98
Qwen2.5-72B-Instruct	10.99	6.27	14.53	1.1	8.61	12.54	22.9
openthaigpt1.5-72b-instruct	12.34	6.58	12.82	1.32	9.21	17.94	26.15
Llama-SEA-LION-v3-70B	12.7	5.48	12.39	2.42	11.48	14.63	29.77
Qwen2.5-7B-Instruct	14.43	6.97	16.24	1.87	10.88	17.25	33.4
SeaLLMs-v3-1.5B	14.61	13.94	13.68	2.2	14.65	13.24	29.96
GPT-4o	16.04	7.87	16.67	3.52	9.67	31.01	27.48
openthaigpt1.5-7b-instruct	16.09	8.47	14.53	2.2	11.78	23.52	36.07
typhoon2.1-gemma3-12b	16.64	7.37	17.95	3.85	7.1	31.71	31.87
Llama-3.3-70B-Instruct	16.87	8.67	13.25	3.08	13.44	18.12	44.66
Llama-SEA-LION-v3-8B-IT	16.9	8.57	19.23	4.07	10.29	20.91	38.36
llama3.1-typhoon2-70b-instruct	18.05	10.26	12.82	1.21	22.05	18.82	43.13
gemma-3-12b-it	20.4	8.86	24.79	9.45	10.27	30.49	38.55
typhoon2.1-gemma3-4b	22.97	13.55	28.63	5.38	13.29	35.19	41.79
Llama-3.1-70B-Instruct	24.49	15.64	17.09	3.96	25.57	30.31	54.39
Llama-3.2-3B	26.08	24.32	20.51	4.73	28.1	21.43	57.37
gemma-3-4b-it	28.11	18.55	31.62	16.83	17.54	37	47.14
Llama-3.1-8B-Instruct	28.24	25.5	21.79	4.84	25.08	27.18	65.08
llama3.1-typhoon2-8b-instruct	32.44	26.92	23.93	11.21	42.6	29.09	60.88
llama3.2-typhoon2-3b-instruct	34.33	30.78	26.92	11.21	49.09	29.97	58.02
Llama-3.2-1B	37.66	36.59	24.79	21.98	41.09	38.33	63.17
llama3.2-typhoon2-1b-instruct	49.35	42.03	40.17	37.78	67.92	46.95	61.26

How it works

We evaluate models on the ThaiSafetyBench benchmark, which consists of various tasks related to safety and harmful content in the Thai language and culture. The evaluation is performed using the ThaiSafetyBench dataset, which includes a range of scenarios designed to assess the model's ability to handle sensitive topics, discrimination, misinformation, and other harmful content. The automatic evaluation is conducted using the GPT-4.1 and Gemini-2.5-pro model as judges. We report the average Attack Success Rate (ASR) for both judges on each task, which indicates the model's vulnerability to the harmful content. We categorize the tasks into two groups: Thai Culture-Related Attacks, which evaluate the model's ability to handle content specific to Thai culture, including its norms, values, and sensitivities, and General Prompt Attacks, which assess the model's capacity to manage broadly harmful content that, while not unique to Thai culture, remains relevant in a wider context.

Reproducibility

To reproduce our results, we provide the automatic evaluation code in our Github repository. You can run the evaluation on your own models by following these steps:

Generate the responses of your model on the ThaiSafetyBench dataset with temperature at 0.1
Use the provided evaluation script to evaluate the responses using the GPT-4.1 and Gemini-2.5-pro model as judges

Developers and Maintainers

SCB DataX, SCBX R&D, SCB 10X AI Research team

For more inquiries, please contact us at trapoom-ukarapol@data-x.ai

✉️✨ Submit your model here!

For the submission, please provide the following information and send it to us via email at trapoom-ukarapol@data-x.ai

Subject: [Your Model Name] ThaiSafetyBench Model Submission
Content:
- Model name
- Developer
- Parameters (in billions)
- Model type (Base or CPT)
- Base model name (if the model is a CPT, otherwise leave empty)
- Release date (YYYY-MM)
- How to run the model (Python code to generate responses, if the model is on Hugging Face Hub, otherwise provide a code snippet to run the model and generate responses)
- Contact email (for us to contact you about the evaluation results)