68
https://pubmed.ncbi.nlm.nih.gov/38117790
This study found that a BERT-based question answering system’s confidence in its answers significantly differs from an agnostic model when evaluated on perturbed instances, suggesting the need for stronger testing protocols before deployment in real-world applications with significant human impact.