Tapping in a Remote Vehicle’s onboard LLM to Complement the Ego Vehicle’s Field-of-View
Keywords:
Pedestrian Detection, Cyber-Physical Systems (CPS), Cooperative Intelligent Transportation Systems (C-ITS), Large Language Models, Generative AI, Vehicle-to-Vehicle, V2VAbstract
Today’s advanced automotive systems are turning into intelligent Cyber-Physical Systems (CPS), bringing com- putational intelligence to their cyber-physical context. Such systems power advanced driver assistance systems (ADAS) that observe a vehicle’s surroundings for their functionality. However, such ADAS have clear limitations in scenarios when the direct line-of-sight to surrounding objects is occluded, like in urban areas. Imagine now automated driving (AD) systems that ideally could benefit from other vehicles’ field-of-view in such occluded situations to increase traffic safety if, for example, locations about pedestrians can be shared across vehicles. Current litera- ture suggests vehicle-to-infrastructure (V2I) via roadside units (RSUs) or vehicle-to-vehicle (V2V) communication to address such issues that stream sensor or object data between vehicles. When considering the ongoing revolution in vehicle system architectures towards powerful, centralized processing units with hardware accelerators, foreseeing the onboard presence of large language models (LLMs) to improve the passengers’ comfort when using voice assistants becomes a reality. We are suggesting and evaluating a concept to complement the ego vehicle’s field- of-view (FOV) with another vehicle’s FOV by tapping into their onboard LLM to let the machines have a dialogue about what the other vehicle “sees”. Our results show that very recent versions of LLMs, such as GPT-4V and GPT-4o, understand a traffic situation to an impressive level of detail, and hence, they can be used even to spot traffic participants. However, better prompts are needed to improve the detection quality and future work is needed towards a standardised message interchange format between vehicles.
References
S. Chakraborty, M. A. Al Faruque, W. Chang, D. Goswami, M. Wolf, and Q. Zhu, “Automotive cyber–physical systems: A tutorial introduction,” IEEE Design & Test, vol. 33, no. 4, pp. 92–108, 2016.
T. Brown et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle,
M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
M. Sallam, “Chatgpt utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns,” Healthcare, vol. 11, no. 6, 2023. [Online]. Available: https://www.mdpi.com/2227-9032/11/6/887
M. N. A. Kun-Feng Wu and W.-J. Ye, “An evaluation scheme for assessing the effectiveness of intersection movement assist (ima) on improving traffic safety,” Traffic Injury Prevention, vol. 19, no. 2, pp. 179–183, 2018, pMID: 28812374. [Online]. Available: https://doi.org/10.1080/15389588.2017.1363891
SOAFEE - Scalable Open Architecture for Embedded Edge, 2024, accessed: 2024-01-31. [Online]. Available: https://www.soafee.io/
M. R. A. H. Rony, C. Suess, S. R. Bhat, V. Sudhi, J. Schneider, M. Vogel,
R. Teucher, K. E. Friedl, and S. Sahoo, “Carexpert: Leveraging large language models for in-car conversational question answering,” 2023.
“Bmw intelligent personal assistant powered by the alexa large language model (llm),” 2024, accessed: 2024-02-26. [Online]. Available: https://tinyurl.com/BMWweb
A. Furda and L. Vlacic, “An object-oriented design of a world model for autonomous city vehicles,” in 2010 IEEE Intelligent Vehicles Symposium, 2010, pp. 1054–1059. [Online]. Available: https://doi.org/10.1109/IVS.2010.5548138
H. Xu, L. Han, Q. Yang, M. Li, and M. Srivastava, “Penetrative ai: Making llms comprehend the physical world,” in Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, 2024, pp. 1–7.
H. Yang, M. Siew, and C. Joe-Wong, “An llm-based digital twin for optimizing human-in-the loop systems,” 2024.
D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K. Fujimura, “Navigating occluded intersections with autonomous vehicles using deep reinforcement learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 2034–2039.
O. Kilani, M. Gouda, J. Weiß, and K. El-Basyouny, “Safety assessment of urban intersection sight distance using mobile lidar data,” Sustainability, vol. 13, no. 16, 2021. [Online]. Available: https://www.mdpi.com/2071-1050/13/16/9259
P. Zhou, T. Braud, A. Zavodovski, Z. Liu, X. Chen, P. Hui, and J. Kan- gasharju, “Edge-facilitated augmented vision in vehicle-to-everything networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 10, pp. 12 187–12 201, 2020.
V. S. Dorbala, J. F. M. Jr, and D. Manocha, “Can an embodied agent find your “cat-shaped mug”? llm-based zero-shot object navigation,” IEEE Robotics and Automation Letters, pp. 1–8, 2023.
Z. Ji, T. Yu, Y. Xu, N. Lee, E. Ishii, and P. Fung, “Towards mitigating LLM hallucination via self reflection,” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 1827–1843. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.123
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
Woman crossing the road - Image, 2015, accessed: 2024- 01-28. [Online]. Available: https://www.istockphoto.com/se/foto/ woman-crossing-street-at-pedestrian-crossing-gm478525366-67201453
OpenAI, “Gpt-4v(ision) system card,” 2023. [Online]. Available: https://cdn.openai.com/papers/GPTV System Card.pdf
Hello GPT-4o, 2024, accessed: 2024-05-15. [Online]. Available: https://openai.com/index/hello-gpt-4o/
A. Dutta and A. Zisserman, “The VIA annotation software for images, audio and video,” in Proceedings of the 27th ACM International Conference on Multimedia, ser. MM ’19. New York, NY, USA: ACM, 2019. [Online]. Available: https://doi.org/10.1145/3343031.3350535
K. Ronanki, B. Cabrero-Daniel, and C. Berger, “Chatgpt as a tool for user story quality evaluation: Trustworthy out of the box?” in International Conference on Agile Software Development. Springer, 2022, pp. 173–181.
Github Repository - Use of LLM for pedestrian detection, 2019, accessed: 2024-01-29. [Online]. Available: https://github.com/ MalshaMahawatta/UseofLLM AirDnD
GPT-4V - Vision Preview, 2024, accessed: 2024-01-29. [Online]. Available: https://platform.openai.com/docs/guides/vision
A. Geiger, C. Wojek, and R. Urtasun, “Joint 3d estimation of objects and scene layout,” in Advances in Neural Information Processing Systems (NIPS), 2011.
C. Li, J. Wang, Y. Zhang, K. Zhu, W. Hou, J. Lian, F. Luo, Q. Yang, and X. Xie, “Large language models understand and can be enhanced by emotional stimuli,” 2023.
R. Feldt and A. Magazinius, “Validity threats in empirical software engineering research - an initial survey.” 01 2010, pp. 374–379.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License Terms:
Except where otherwise noted, content on this website is lincesed under a Creative Commons Attribution Non-Commercial License (CC BY NC)
Use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes, is permitted.
Copyright to any article published by WiPiEC retained by the author(s). Authors grant WiPiEC Journal a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as it is not used for commercial purposes and its original authors, citation details, and publisher are identified, in accordance with CC BY NC license. Fore more information on license terms, click here.