Why Data Contamination Is A Big Issue For LLMS
TechTalks, Monday, July 17,2023
Since the release of ChatGPT-and even more so since GPT-4-I have seen a recurring pattern of hype and disappointment. First, a study claims that ChatGPT, GPT-4, or 'name your LLM' has passed or aced some difficult test designed for humans: the bar exam, math exams, MIT exams, coding competitions, comprehension tests, etc. And then, another study disproves the results of the previous study.
It turns out that when examined more closely, the model is providing the right answers for the wrong reasons.
The science and research community is still exploring the correct ways to evaluate the capabilities of large language models (LLM). And in the meantime, we're discovering why the initial results of LLMs on human tests are misleading. Among the key reasons for these mistakes is 'data contamination,' which basically means the test examples were included in the model's training data.