qwen2.5表现对比
| 数据集 | Qwen2-72B-Instruct | Llama3.1-70B-Instruct | Qwen2.5-32B-Instruct | Mistral-Large-Instruct-2407 (123B) | GPT4o-mini | Qwen2.5-72B-Instruct |
|---|---|---|---|---|---|---|
| IFEval(多语言) | 79.69 | 80.47 | 82.68 | 82.69 | 85.03 | 86.98 |
| AMMLU(阿拉伯语) | 68.85 | 70.08 | 70.44 | 69.24 | 69.77 | 72.44 |
| JMMLU(日语) | 77.37 | 73.89 | 76.55 | 75.77 | 73.74 | 80.56 |
| KMMLU(韩语) | 57.04 | 53.23 | 60.75 | 56.42 | 56.77 | 61.96 |
| IndoMMLU(印尼语) | 66.31 | 67.50 | 66.42 | 63.21 | 67.75 | 69.25 |
| TurkishMMLU(土耳其语) | 69.22 | 66.89 | 72.41 | 64.78 | 71.19 | 76.12 |
| okapi MMLU(翻译) | 77.84 | 76.49 | 77.16 | 78.37 | 73.44 | 79.97 |
| MGSM8K(扩展版) | 82.72 | 73.31 | 87.15 | 89.01 | 87.36 | 88.16 |
| BLEnD | 25.90 | 30.49 | 27.88 | 33.47 | 35.91 | 32.48 |
| 数据集 | Qwen2-7B-Instruct | Llama3.1-8B-Instruct | Qwen2.5-7B-Instruct | Gemma-2-9B-Instruct | Mistral-Nemo-Instruct-2407 (12B) | Qwen2.5-14B-Instruct |
|---|---|---|---|---|---|---|
| IFEval(多语言) | 51.43 | 60.68 | 74.87 | 77.47 | 64.59 | 77.08 |
| AMMLU(阿拉伯语) | 54.87 | 54.28 | 59.78 | 60.26 | 53.92 | 66.81 |
| JMMLU(日语) | 57.71 | 53.26 | 61.88 | 64.59 | 55.17 | 72.78 |
| KMMLU(韩语) | 43.96 | 42.28 | 46.59 | 46.24 | 42.22 | 59.71 |
| IndoMMLU(印尼语) | 54.05 | 53.92 | 56.42 | 61.73 | 50.76 | 65.09 |
| TurkishMMLU(土耳其语) | 49.27 | 45.61 | 54.28 | 55.44 | 34.44 | 66.85 |
| okapi MMLU(翻译) | 60.47 | 55.18 | 66.98 | 46.72 | 59.65 | 72.12 |
| MGSM8K(扩展版) | 56.13 | 66.05 | 66.11 | 78.37 | 54.75 | 82.27 |
| BLEnD | 22.49 | 19.47 | 23.66 | 28.31 | 30.47 | 32.48 |