Language AI turns out to know whether their answers are correct! New research at Berkeley and other colleges on fire, netizens: danger danger danger

By 18 Jul,2022

But the team found that the language AI's ability to calibrate is based on the premise that the answer to the option is clear. Adding a "none of the above" uncertainty to the options compromised the language AI's ability to calibrate.

That is, in a particular format of multiple-choice questions, the language AI model can calibrate the answer very well. Having clarified this premise, the next problem was to verify that the language AI model could determine whether its own answers were correct.

In this round of testing, the AI model's predictions were brought closer to the boundaries of its own valid decisions in order to be able to do so. The research team still chose the questions from the previous round of testing, as well as a sample of the language AI model's answers.

The AI model was also asked to choose whether its answer was true or false, and then the AI model was analyzed to see if it was effectively calibrated for the "true" or "false" answer. An example of the problem set is as follows.

After 20 true-false tests, the team found that the language AI models were clearly calibrated for their own answers, either "true" or "false".

That is, if an AI model is given a number of questions within a range, then the AI model evaluates the answers to those questions as true or false with a reasonable, calibrated level of confidence.

This also demonstrates that linguistic AI models can indeed determine whether their claims about a question are correct.

Finally, the research team asked a more difficult question of the language AI models: whether the AI models could be trained to predict whether they knew the answer to any given question.

In this session, the research team introduced a data P (IK) (the probability that I know this answer) and trained it in one of the following two training methods.

Value Head: The P (IK) is trained as an additional value head and added to the logarithm of the model (independent of the logarithm of the language modeling, the advantage of this approach is that the research team can easily detect the general token position of the P (IK).

2/3