Leaderboard in MultiTrust (Updating...)
# | Model | Source | Avg. | T.I | T.M | S.T | S.J | R.O | R.A | F.S | F.B | P.A | P.L |
1 | GPT-4-Turbo🥇 | Link | 78.3 | 75.1 | 76.6 | 80.5 | 92.5 | 80.9 | 55.9 | 79.4 | 83.1 | 74.4 | 84.3 |
2 | Claude3.5-Sonnet🥈 | Link | 76.7 | 72.5 | 67.1 | 81.5 | 94.0 | 68.0 | 58.5 | 89.7 | 69.1 | 69.1 | 97.5 |
3 | GPT-4o🥉 | Link | 76.6 | 78.3 | 67.3 | 79.5 | 89.0 | 82.0 | 56.1 | 86.9 | 59.0 | 76.6 | 91.5 |
4 | Claude3-Sonnet | Link | 72.8 | 66.8 | 60.3 | 77.2 | 97.4 | 72.7 | 52.0 | 75.5 | 63.1 | 63.3 | 99.3 |
5 | Phi-3.5-Vision | Link | 66.3 | 58.9 | 47.2 | 65.1 | 89.8 | 74.0 | 54.4 | 90.1 | 64.0 | 61.1 | 58.2 |
6 | Phi-3-Vision | Link | 64.3 | 58.6 | 44.1 | 63.9 | 85.6 | 73.4 | 51.2 | 92.0 | 50.4 | 65.2 | 58.2 |
7 | GLM-4v-9B | Link | 63.9 | 66.1 | 52.8 | 67.2 | 79.4 | 78.4 | 70.8 | 88.8 | 37.8 | 60.8 | 36.8 |
8 | Qwen-VL-Plus | Link | 63.5 | 68.5 | 59.4 | 68.8 | 66.2 | 75.2 | 36.6 | 64.1 | 82.9 | 59.8 | 53.5 |
9 | Cambrian-13B | Link | 63.5 | 64.4 | 54.0 | 68.5 | 72.3 | 72.2 | 41.8 | 80.4 | 66.7 | 53.2 | 61.1 |
10 | Qwen2-VL-Chat | Link | 63.3 | 68.7 | 50.0 | 65.0 | 79.9 | 79.0 | 39.0 | 83.0 | 70.1 | 65.1 | 32.9 |
11 | Cambrian-8B | Link | 62.7 | 62.1 | 52.3 | 67.4 | 66.2 | 70.8 | 47.4 | 78.7 | 68.2 | 54.1 | 59.8 |
12 | InternVL2-8B | Link | 62.2 | 64.2 | 52.1 | 62.8 | 78.3 | 75.4 | 38.9 | 89.0 | 64.7 | 60.4 | 36.1 |
13 | LLaVA-v1.6-Vicuna-13B-hf | Link | 61.9 | 58.8 | 50.1 | 68.5 | 44.3 | 76.6 | 56.0 | 84.8 | 77.5 | 46.3 | 56.1 |
14 | Hunyuan-V | Link | 61.6 | 66.0 | 52.3 | 67.1 | 56.4 | 74.1 | 73.5 | 82.6 | 35.9 | 61.8 | 46.7 |
15 | Llama3-LLaVA-NeXT-8b-hf | Link | 59.8 | 58.4 | 49.7 | 69.5 | 40.5 | 76.4 | 56.1 | 83.2 | 62.5 | 56.8 | 45.1 |
16 | GeminiPro-1.0 | Link | 59.6 | 65.1 | 67.3 | 72.8 | 55.8 | 78.4 | 50.4 | 72.3 | 27.7 | 70.5 | 35.7 |
17 | DeepSeek-VL-7b | Link | 58.9 | 54.9 | 39.9 | 66.3 | 58.0 | 75.9 | 58.1 | 76.4 | 74.2 | 49.0 | 36.6 |
18 | LLaVA-NeXT-13B | Link | 58.3 | 55.5 | 58.6 | 68.4 | 43.5 | 76.5 | 39.4 | 67.9 | 63.5 | 53.8 | 55.5 |
19 | Llama-3.2-Vision-Instruct | Link | 58.2 | 63.9 | 56.8 | 61.2 | 58.5 | 76.9 | 42.0 | 79.2 | 36.1 | 57.2 | 49.7 |
20 | mPLUG-Owl3-7B | Link | 57.6 | 55.7 | 45.9 | 57.3 | 68.9 | 64.2 | 37.8 | 73.0 | 63.6 | 67.7 | 41.9 |
21 | InternLM-Xcomposer2 | Link | 56.8 | 61.8 | 52.9 | 63.6 | 51.2 | 75.4 | 38.9 | 79.8 | 49.1 | 60.4 | 35.0 |
22 | MiniGPT-4-Llama2-7B | Link | 55.7 | 48.3 | 50.2 | 69.8 | 74.5 | 63.0 | 35.4 | 65.7 | 37.5 | 42.5 | 70.0 |
23 | LLaVA-v1.6-Mistral-7B-hf | Link | 54.7 | 58.4 | 48.8 | 68.8 | 32.8 | 73.2 | 54.0 | 80.8 | 46.9 | 36.5 | 47.0 |
24 | DeepSeek-VL2 | Link | 54.6 | 58.2 | 45.0 | 54.8 | 61.5 | 68.0 | 58.3 | 82.5 | 51.7 | 39.0 | 26.5 |
25 | LLaVA-v1.6-Vicuna-7B-hf | Link | 54.3 | 50.5 | 39.8 | 69.5 | 37.1 | 66.0 | 55.4 | 68.4 | 57.2 | 58.2 | 40.9 |
26 | Molmo-7B | Link | 54.2 | 60.1 | 41.6 | 57.8 | 33.8 | 75.3 | 56.7 | 85.0 | 32.4 | 65.5 | 34.1 |
27 | InternVL-Chat-Vicuna-13B | Link | 53.6 | 58.8 | 52.4 | 56.4 | 43.3 | 71.7 | 55.2 | 71.1 | 35.0 | 57.9 | 33.7 |
28 | LLaVA-v1.5-13B-hf | Link | 53.4 | 53.9 | 50.9 | 63.7 | 49.8 | 75.0 | 31.0 | 84.2 | 30.6 | 57.9 | 37.0 |
29 | CogVLM2-Llama3-Chat-19B | Link | 52.9 | 57.6 | 44.5 | 57.1 | 30.7 | 75.7 | 63.3 | 85.0 | 32.1 | 47.1 | 36.1 |
30 | mPLUG-Owl2 | Link | 52.7 | 55.9 | 50.4 | 60.1 | 33.4 | 74.9 | 36.3 | 73.5 | 51.8 | 56.6 | 34.6 |
31 | LVIS-Instruct4V | Link | 52.1 | 54.8 | 46.8 | 58.8 | 49.3 | 64.2 | 29.1 | 71.5 | 35.7 | 58.6 | 52.3 |
32 | LLaVA-RLHF-13B | Link | 52.1 | 50.1 | 51.2 | 59.4 | 35.9 | 70.5 | 31.8 | 69.4 | 39.1 | 53.9 | 59.7 |
33 | LLaVA-v1.5-13B | Link | 51.7 | 58.8 | 53.9 | 61.9 | 39.5 | 74.1 | 30.8 | 67.8 | 39.5 | 54.1 | 36.2 |
34 | InternLM-XComposer | Link | 51.2 | 53.6 | 41.6 | 45.7 | 57.8 | 68.2 | 27.8 | 71.2 | 46.6 | 56.2 | 43.1 |
35 | Qwen-VL-Chat | Link | 51.1 | 59.0 | 49.2 | 59.2 | 39.6 | 72.1 | 41.7 | 64.6 | 34.6 | 53.6 | 37.2 |
36 | CogVLM | Link | 50.5 | 55.3 | 46.3 | 61.5 | 55.8 | 74.1 | 53.1 | 62.1 | 32.2 | 40.2 | 24.8 |
37 | ShareGPT4V-13B | Link | 50.3 | 55.8 | 50.2 | 59.0 | 39.1 | 69.3 | 33.2 | 70.4 | 34.7 | 51.8 | 39.1 |
38 | LLaVA-v1.5-7B-hf | Link | 49.0 | 49.9 | 41.2 | 58.1 | 30.8 | 72.8 | 31.6 | 82.7 | 49.4 | 50.8 | 22.6 |
39 | LLaVA-v1.5-7B | Link | 48.5 | 54.1 | 48.4 | 58.0 | 37.4 | 74.1 | 28.4 | 70.6 | 38.5 | 48.3 | 26.8 |
40 | DeepSeek-Janus-Pro-7B | Link | 48.1 | 53.9 | 39.3 | 59.2 | 19.5 | 73.7 | 61.6 | 74.0 | 22.6 | 43.5 | 34.0 |
41 | MiniGPT-4-Vicuna-13B | Link | 47.2 | 44.8 | 45.8 | 47.4 | 34.0 | 64.1 | 39.1 | 64.8 | 34.2 | 39.5 | 58.2 |
42 | InstructBLIP-FlanT5xxl | Link | 43.5 | 46.3 | 41.0 | 41.5 | 43.3 | 74.1 | 33.2 | 57.8 | 30.0 | 58.1 | 9.3 |
43 | Otter | Link | 40.8 | 42.0 | 34.3 | 50.6 | 45.6 | 57.8 | 24.0 | 57.9 | 34.5 | 40.9 | 20.5 |
44 | mPLUG-Owl | Link | 39.9 | 48.3 | 42.8 | 49.4 | 24.1 | 73.1 | 20.3 | 50.1 | 37.6 | 39.1 | 14.2 |