โš–๏ธ Speech-to-Speech Model Comparison

Welcome to the Speech-to-Speech (S2S) Model Evaluation! ๐Ÿ‘ In this evaluation, you will assess the performance of different S2S models, such as ChatGPT-4o, FunAudioLLM, SpeechGPT, Mini-Omni, Cascade, and LLaMA-Omni.
๐ŸŽฏ Goal: Test how well these models handle speech tasks across different domains. How It Works Once you select a specific domain and task (e.g., Educational Tutoring and Rhythm Control), you will proceed to the evaluation stage. In each round, you will be presented with an audio input.
๐ŸŒฐ Example:

Audio Sample:
The corresponding text is: "Say the following sentence at my speed first, then say it again very slowly: 'Artificial intelligence is changing the world in many ways.'" ๐Ÿง  (Note: the audio plays at 1.5x the normal speed.) Model Performance
ChatGPT-4o:

๐ŸŽ™๏ธ Speech: Partially followed the instruction on speed.

๐Ÿงพ Semantics: Accurately followed the instruction, with no semantic deviation or missing information.


FunAudioLLM:

๐ŸŽ™๏ธ Speech: Partially followed the instruction on speed.

๐Ÿงพ Semantics: Accurately followed the instruction, with no semantic deviation or missing information.


SpeechGPT:

๐ŸŽ™๏ธ Speech: Did not follow the instruction on speed.

๐Ÿงพ Semantics: Partially followed the instruction, with minor semantic deviation and missing information.


Mini-Omni:

๐ŸŽ™๏ธ Speech: Did not follow the instruction on speed.

๐Ÿงพ Semantics: Did not follow the instruction, with significant semantic deviation and missing information.

After making your choice, you'll proceed to the next round. ๐Ÿ”„

Click the button below to start the evaluation! ๐Ÿš€

Start Evaluation