Using GPT-4 for Source Code Documentation

Authors

  • Magdalena Kneidinger Institute of Business Informatics - Software Engineering, Johannes Kepler University
  • Markus Feneberger Institute of Business Informatics - Software Engineering, Johannes Kepler University
  • Reinhold Plösch Institute of Business Informatics - Software Engineering, Johannes Kepler University

Keywords:

LLM, Large Language Model, GPT-4, Software Documentation, Class Documentation, Method Documentation

Abstract

Writing good software documentation imposes significant effort. Large Language Models (LLMs) could potentially streamline that process, though. So, the question arises whether current LLMs are able to generate valid code documentation for classes and methods on basis of the bare code. According to literature, various such models have the capability to generate documentation that is on par with or even superior to reference documentation. In our experimental study using zero-shot prompting, we found that the model GPT-4 by OpenAI leads to poor results when measuring similarity to the reference documentation on class level. Thus, GPT-4 is not yet usable for generating class documentation. On method level, however, the model achieved higher similarity ratings and can be considered applicable for the use case.

References

A. Y. Wang, D. Wang, J. Drozdal, et al., “Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks,” en, ACM Transactions on Computer-Human Interaction, vol. 29, no. 2, pp. 1–33, Apr. 2022, ISSN:1073-0516, 1557-7325. DOI: 10.1145/3489465.

R. S. Geiger, N. Varoquaux, C. Mazel-Cabasse, and C. Holdgraf, “The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work,” en, Computer Supported Cooperative Work (CSCW), vol. 27, no. 3-6, pp. 767–802, Dec. 2018, ISSN: 0925-9724, 1573-7551. DOI: 10.1007/s10606-018-9333-1.

W. Sun, C. Fang, Y. You, et al., Automatic Code Summarization via ChatGPT: How Far Are We? arXiv:2305.12865 [cs], May 2023. DOI: 10 . 48550 / arXiv.2305.12865.

J. Cao, M. Li, M. Wen, and S.-c. Cheung, “A study on Prompt Design, Advantages and Limitations of Chat-GPT for Deep Learning Program Repair,” 2023. DOI: 10.48550/ARXIV.2304.08191.

Z. Feng, D. Guo, D. Tang, et al., “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” en, in Findings of the Association for Computational Linguistics: EMNLP 2020, Online: Association for Computational Linguistics, 2020, pp. 1536–1547. DOI: 10.18653/v1/2020.findings-emnlp.139.

Y. Wang, H. Le, A. D. Gotmare, N. D. Q. Bui, J. Li, and S. C. H. Hoi, CodeT5+: Open Code Large Language Models for Code Understanding and Generation, arXiv:2305.07922 [cs], May 2023.

M. Kajko-Mattsson, “A Survey of Documentation Practice within Corrective Maintenance,” en, Empirical Software Engineering, vol. 10, no. 1, pp. 31–55, Jan. 2005, ISSN: 1382-3256. DOI: 10.1023/B:LIDA.0000048322. 42751.ca.

S. S. Dvivedi, V. Vijay, S. L. R. Pujari, S. Lodh, and D. Kumar, A Comparative Analysis of Large Language Models for Code Documentation Generation, arXiv:2312.10349 [cs], Dec. 2023.

C.-Y. Su and C. McMillan, Distilled GPT for Source Code Summarization, arXiv:2308.14731 [cs], Aug. 2023. DOI: 10.48550/arXiv.2308.14731.

T. Ahmed and P. Devanbu, “Few-shot training LLMs for project-specific code-summarization,” en, in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester MI USA: ACM, Oct. 2022, pp. 1–5, ISBN: 978-1-4503-9475-8. DOI: 10.1145/3551349.3559555.

M. Geng, S. Wang, D. Dong, et al., “Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning,” 2023. DOI:10.48550/ARXIV.2304.11384.

A. H. Mohammadkhani, C. Tantithamthavorn, and H. Hemmati, Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? arXiv:2211.12821 [cs], Aug. 2023.

Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation,” en, in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 8696–8708. DOI: 10. 18653/v1/2021.emnlp-main.685.

OpenAI, J. Achiam, S. Adler, et al., GPT-4 Technical Report, arXiv:2303.08774 [cs], Mar. 2024.

T. Kajiura, N. Souma, M. Sato, M. Takahashi, and K. Kuramitsu, “An additional approach to pre-trained code model with multilingual natural languages,” in 2022 29th Asia-Pacific Software Engineering Conference (APSEC), ISSN: 2640-0715, Dec. 2022, pp. 580–581. DOI: 10.1109/APSEC57359.2022.00090.

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” en, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, p. 311. DOI: 10.3115/1073083.1073135.

C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81.

E. Reiter, “A Structured Review of the Validity of BLEU,” en, Computational Linguistics, vol. 44, no. 3, pp. 393–401, Sep. 2018, ISSN: 0891-2017, 1530-9312. DOI: 10.1162/coli a 00322.

S. A. Rukmono, L. Ochoa, and M. R. Chaudron, “Achieving High-Level Software Component Summarization via Hierarchical Chain-of-Thought Prompting and Static Code Analysis,” in 2023 IEEE International Conference on Data and Software Engineering (ICoDSE), Toba, Indonesia: IEEE, Sep. 2023, pp. 7–12, ISBN: 9798350381382. DOI: 10.1109/ICoDSE59534. 2023.10292037.

Downloads

Published

2024-08-20

How to Cite

Kneidinger, M., Feneberger, M., & Plösch, R. (2024). Using GPT-4 for Source Code Documentation. WiPiEC Journal - Works in Progress in Embedded Computing Journal, 10(2). Retrieved from https://wipiec.digitalheritage.me/index.php/wipiecjournal/article/view/74