Mastodon - 2024-02-06T05:19:55Z

Mastodon

The words โ€œwrong" and "careful" are doing a tremendous amount of heavy lifting in this sentence:

"Calibration has so far been studied in non generative (e.g., classification) settings, especially in Software Engineering. However, generated code can quite often be wrong: Developers need to know when they should e.g., directly use, use after careful review, or discard model-generated code.โ€

arxiv.org/abs/2402.02047

Mastodon Source ๐Ÿ˜

Given the oracle is known unreliable, this seems odd.

"We consider...reflective probability, obtained by re-invoking the model, prompting it to estimate how confident it is in the correctness of of the code it had just generated (See Figure 1)."

Mastodon Source ๐Ÿ˜