Tapping in a Remote Vehicle’s onboard LLM to Complement the Ego Vehicle’s Field-of-View

Authors

  • Malsha Ashani Mahawatta Dona University of Gothenburg
  • Beatriz Cabrero-Daniel University of Gothenburg
  • Yinan Yu Chalmers University of Technology
  • Christian Berger University of Gothenburg

Keywords:

Pedestrian Detection, Cyber-Physical Systems (CPS), Cooperative Intelligent Transportation Systems (C-ITS), Large Language Models, Generative AI, Vehicle-to-Vehicle, V2V

Abstract

Today’s advanced automotive systems are turning into intelligent Cyber-Physical Systems (CPS), bringing com- putational intelligence to their cyber-physical context. Such systems power advanced driver assistance systems (ADAS) that observe a vehicle’s surroundings for their functionality. However, such ADAS have clear limitations in scenarios when the direct line-of-sight to surrounding objects is occluded, like in urban areas. Imagine now automated driving (AD) systems that ideally could benefit from other vehicles’ field-of-view in such occluded situations to increase traffic safety if, for example, locations about pedestrians can be shared across vehicles. Current litera- ture suggests vehicle-to-infrastructure (V2I) via roadside units (RSUs) or vehicle-to-vehicle (V2V) communication to address such issues that stream sensor or object data between vehicles. When considering the ongoing revolution in vehicle system architectures towards powerful, centralized processing units with hardware accelerators, foreseeing the onboard presence of large language models (LLMs) to improve the passengers’ comfort when using voice assistants becomes a reality. We are suggesting and evaluating a concept to complement the ego vehicle’s field- of-view (FOV) with another vehicle’s FOV by tapping into their onboard LLM to let the machines have a dialogue about what the other vehicle “sees”. Our results show that very recent versions of LLMs, such as GPT-4V and GPT-4o, understand a traffic situation to an impressive level of detail, and hence, they can be used even to spot traffic participants. However, better prompts are needed to improve the detection quality and future work is needed towards a standardised message interchange format between vehicles.

References

S. Chakraborty, M. A. Al Faruque, W. Chang, D. Goswami, M. Wolf, and Q. Zhu, “Automotive cyber–physical systems: A tutorial introduction,” IEEE Design & Test, vol. 33, no. 4, pp. 92–108, 2016.

T. Brown et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle,

M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

M. Sallam, “Chatgpt utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns,” Healthcare, vol. 11, no. 6, 2023. [Online]. Available: https://www.mdpi.com/2227-9032/11/6/887

M. N. A. Kun-Feng Wu and W.-J. Ye, “An evaluation scheme for assessing the effectiveness of intersection movement assist (ima) on improving traffic safety,” Traffic Injury Prevention, vol. 19, no. 2, pp. 179–183, 2018, pMID: 28812374. [Online]. Available: https://doi.org/10.1080/15389588.2017.1363891

SOAFEE - Scalable Open Architecture for Embedded Edge, 2024, accessed: 2024-01-31. [Online]. Available: https://www.soafee.io/

M. R. A. H. Rony, C. Suess, S. R. Bhat, V. Sudhi, J. Schneider, M. Vogel,

R. Teucher, K. E. Friedl, and S. Sahoo, “Carexpert: Leveraging large language models for in-car conversational question answering,” 2023.

“Bmw intelligent personal assistant powered by the alexa large language model (llm),” 2024, accessed: 2024-02-26. [Online]. Available: https://tinyurl.com/BMWweb

A. Furda and L. Vlacic, “An object-oriented design of a world model for autonomous city vehicles,” in 2010 IEEE Intelligent Vehicles Symposium, 2010, pp. 1054–1059. [Online]. Available: https://doi.org/10.1109/IVS.2010.5548138

H. Xu, L. Han, Q. Yang, M. Li, and M. Srivastava, “Penetrative ai: Making llms comprehend the physical world,” in Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, 2024, pp. 1–7.

H. Yang, M. Siew, and C. Joe-Wong, “An llm-based digital twin for optimizing human-in-the loop systems,” 2024.

D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K. Fujimura, “Navigating occluded intersections with autonomous vehicles using deep reinforcement learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 2034–2039.

O. Kilani, M. Gouda, J. Weiß, and K. El-Basyouny, “Safety assessment of urban intersection sight distance using mobile lidar data,” Sustainability, vol. 13, no. 16, 2021. [Online]. Available: https://www.mdpi.com/2071-1050/13/16/9259

P. Zhou, T. Braud, A. Zavodovski, Z. Liu, X. Chen, P. Hui, and J. Kan- gasharju, “Edge-facilitated augmented vision in vehicle-to-everything networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 10, pp. 12 187–12 201, 2020.

V. S. Dorbala, J. F. M. Jr, and D. Manocha, “Can an embodied agent find your “cat-shaped mug”? llm-based zero-shot object navigation,” IEEE Robotics and Automation Letters, pp. 1–8, 2023.

Z. Ji, T. Yu, Y. Xu, N. Lee, E. Ishii, and P. Fung, “Towards mitigating LLM hallucination via self reflection,” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 1827–1843. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.123

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.

Woman crossing the road - Image, 2015, accessed: 2024- 01-28. [Online]. Available: https://www.istockphoto.com/se/foto/ woman-crossing-street-at-pedestrian-crossing-gm478525366-67201453

OpenAI, “Gpt-4v(ision) system card,” 2023. [Online]. Available: https://cdn.openai.com/papers/GPTV System Card.pdf

Hello GPT-4o, 2024, accessed: 2024-05-15. [Online]. Available: https://openai.com/index/hello-gpt-4o/

A. Dutta and A. Zisserman, “The VIA annotation software for images, audio and video,” in Proceedings of the 27th ACM International Conference on Multimedia, ser. MM ’19. New York, NY, USA: ACM, 2019. [Online]. Available: https://doi.org/10.1145/3343031.3350535

K. Ronanki, B. Cabrero-Daniel, and C. Berger, “Chatgpt as a tool for user story quality evaluation: Trustworthy out of the box?” in International Conference on Agile Software Development. Springer, 2022, pp. 173–181.

Github Repository - Use of LLM for pedestrian detection, 2019, accessed: 2024-01-29. [Online]. Available: https://github.com/ MalshaMahawatta/UseofLLM AirDnD

GPT-4V - Vision Preview, 2024, accessed: 2024-01-29. [Online]. Available: https://platform.openai.com/docs/guides/vision

A. Geiger, C. Wojek, and R. Urtasun, “Joint 3d estimation of objects and scene layout,” in Advances in Neural Information Processing Systems (NIPS), 2011.

C. Li, J. Wang, Y. Zhang, K. Zhu, W. Hou, J. Lian, F. Luo, Q. Yang, and X. Xie, “Large language models understand and can be enhanced by emotional stimuli,” 2023.

R. Feldt and A. Magazinius, “Validity threats in empirical software engineering research - an initial survey.” 01 2010, pp. 374–379.

Downloads

Published

2024-08-20

How to Cite

Ashani Mahawatta Dona, M., Cabrero-Daniel, B., Yu, Y., & Berger, C. (2024). Tapping in a Remote Vehicle’s onboard LLM to Complement the Ego Vehicle’s Field-of-View. WiPiEC Journal - Works in Progress in Embedded Computing Journal, 10(2). Retrieved from https://wipiec.digitalheritage.me/index.php/wipiecjournal/article/view/70