From Citing Hallucinations to Precise Generation: ScholarCopilot Revolutionizes the Rules of the Academic Writing Game
Academic writing has always been a task that requires extensive literature review and data analysis. However, with the development of AI technology, academic writing is now undergoing a new transformation. Nevertheless, large language models (such as ChatGPT and GPT-4) often encounter a problem known as “citation hallucination” when generating academic texts. This means that although the models can produce coherent and fluent text, they frequently fabricate non-existent citations, posing a significant challenge to the accuracy of academic writing.

Fortunately, recently a Chinese research team from the University of Waterloo in Canada and Carnegie Mellon University has proposed a new intelligent academic writing framework called ScholarCopilot. The emergence of this framework marks a shift in academic writing, moving beyond the traditional “search-generate” methods. Through a more intelligent and precise mechanism, it effectively addresses the issue of citation hallucination, bringing new hope to the academic community.
The Innovations of ScholarCopilot
To understand the uniqueness of ScholarCopilot, we first need to learn about traditional academic writing methods. The commonly used Retrieval-Augmented Generation (RAG) method typically follows a “retrieve first, then generate” process. The model first retrieves relevant literature from a database and then generates corresponding text based on this literature.

Although this method can ensure the accuracy of citations to a certain extent, it also has some problems:
1. Independence of retrieval and generation: The retrieval process and the generation process are separate, which may easily lead to insufficient closeness between the generated text and the cited references, thus affecting the quality and accuracy of the paper.
2. Incapable of dynamically adjusting the citation strategy: This method has difficulty in adjusting the citation strategy according to the changes in the context, which may result in inappropriate citations.
To address these issues, ScholarCopilot proposes a more intelligent solution—a dynamic mechanism of “generating while retrieving.” This means that during the text generation process, the model can dynamically determine when references to literature are needed and, if necessary, initiate a literature retrieval request through a special retrieval signal ([RET]). Subsequently, the system retrieves relevant literature from academic databases in real time and integrates this information into the subsequent text generation process. By jointly optimizing the generation task and the retrieval task, ScholarCopilot significantly improves the accuracy and relevance of literature citations.

This method is similar to the process of human writing: when we write papers, we usually start by drafting the main content, then search for references as needed, insert the appropriate citation information, and continue writing. Compared with traditional methods, ScholarCopilot’s approach can generate academic texts that better align with actual needs in a more natural way.
How about the performance?
To evaluate the performance of ScholarCopilot, the research team conducted extensive tests. They trained the model using 500,000 papers from arXiv and assessed its performance across multiple dimensions.
1. Citation Retrieval Accuracy: In terms of citation retrieval accuracy, ScholarCopilot performs excellently, with a Top-1 accuracy rate reaching 40.1%. Compared with other retrieval models, this result is significantly better than:
- E5-Mistral-7B-Instruct(15.0%)
- BM25(9.8%)
2. Quality of Paper Generation: In terms of the quality of paper generation (including relevance, coherence, academic rigor, completeness, and innovativeness), ScholarCopilot achieved a comprehensive score of 16.2 (out of 25), surpassing the larger-parameter Qwen-2.5-72B-Instruct model (15.8) and the Qwen-2.5-7B-Instruct model (13.9). This indicates that ScholarCopilot not only performs excellently in citation accuracy but also achieves relatively ideal results in the quality of generated paper content.

User Reviews
More importantly, ScholarCopilot has also undergone real-person evaluation. The participants were 10 students with an average of 4.2 years of academic writing experience, including 5 doctoral students, 4 master’s students, and 1 undergraduate student. The results of the evaluation are exciting:
- Citation Quality: 100% of respondents believe that the citation quality generated by ScholarCopilot is very high, completely solving the problem of “citation hallucination”.
- Overall practicality: Over 70% of the respondents indicated that they recognize the overall practicality of ScholarCopilot in actual use.

These data not only demonstrate the advantages of ScholarCopilot in citation generation but also show its wide applicability in academic writing.
The Limitations of ScholarCopilot and Its Improvement Directions
Despite achieving remarkable results, ScholarCopilot still has some limitations, which are mainly reflected in the following aspects:
1. Completeness of content generation: Although the model performs excellently in generating academic texts, there is still room for improvement in the richness and comprehensiveness of the generated content. Future versions may further optimize the model to more comprehensively cover various contents in the research field.
2. Lack of innovation: The current version does not perform sufficiently well in generating innovative ideas and research questions, leaving significant room for improvement. With the development of technology, it is expected that future versions will make breakthroughs in this area.
In addition, the respondents also put forward some valuable suggestions, believing that future versions of ScholarCopilot could:
- Achieve closer integration with mainstream academic writing platforms (such as Overleaf) to enhance the platform’s compatibility.
- Support independent generation by chapters and text prediction at any cursor position to enable users to conduct academic writing more flexibly.
The research team stated that these feedback opinions will serve as an important reference for the development of future versions, ensuring that ScholarCopilot can continue to improve and better meet the needs of academic writing.
Summary and Outlook
With the continuous development of AI technology, tools for academic writing are also making progress. ScholarCopilot effectively addresses the issue of “citation hallucination” in traditional academic writing through its innovative mechanism of “generating while retrieving,” thereby improving the accuracy of literature citations and the quality of academic texts. Although the current version still has certain limitations, its emergence undoubtedly represents a revolutionary transformation in academic writing. Future versions are expected to be further optimized to meet a broader range of academic writing needs.
With the advancement of technology, the efficiency of academic writing will be greatly improved. Researchers can focus more on innovation and thinking, rather than being troubled by tedious literature searching and citation issues. It is believed that in the near future, AI will become an indispensable and powerful assistant in the academic community.