This article introduces ICE-Score, a novel evaluation metric for assessing the quality of code generated by large language models. Unlike existing metrics, ICE-Score achieves superior correlations with functional correctness and human preferences across multiple programming languages without relying on human-written test suites or references, demonstrating its efficacy in code intelligence tasks.https://arxiv.org/pdf/2304.14317.pdf
This article introduces ICE-Score, a novel evaluation metric for assessing the quality of code generated by large language models. Unlike existing metrics, ICE-Score achieves superior correlations with functional correctness and human preferences across multiple programming languages without relying on human-written test suites or references, demonstrating its efficacy in code intelligence tasks. https://arxiv.org/pdf/2304.14317.pdf