USF-LVHN SELECT
AI-Powered Workflow for Constructing Organic Materials Databases from the Literature: Integrating Large Language Models.
Publication/Presentation Date
10-28-2025
Abstract
We developed an end-to-end workflow to automate the construction of materials science databases from published literature, addressing a traditionally manual, time-intensive, and labor-intensive process. The work systematically evaluates and compares different machine learning (ML) methods to optimize each task. For identifying relevant publications, we tested various ML techniques and concluded that a combination of large language model (LLM)-based embeddings, clustering, and direct LLM queries is most effective. In the subsequent data extraction phase, we employed OpenAI's GPT-4 to extract materials and their properties, achieving accuracy comparable to manually curated data sets. Additionally, we integrated AI/ML methods to automatically generate SMILES from chemical structure images, expanding the workflow's applicability to organic materials. To validate the workflow, we applied it to studying organic donor materials in organic photovoltaic devices and benchmarked its performance against a manually curated data set derived from 503 papers. The results demonstrate the workflow's efficiency and accuracy. Finally, based on our findings, we provide recommendations for selecting the best ML methods for each task and propose further improvements for the future tool development. This workflow represents a major advancement in accelerating the development of materials science databases and enables data science applications in a broader range of research topics that were historically infeasible due to the lack of available data sets.
Volume
10
Issue
42
First Page
49545
Last Page
49556
ISSN
2470-1343
Published In/Presented At
Hu, H., Stirrat, H. J., Alayli, A., Saeki, A., & Huang, Y. (2025). AI-Powered Workflow for Constructing Organic Materials Databases from the Literature: Integrating Large Language Models. ACS omega, 10(42), 49545–49556. https://doi.org/10.1021/acsomega.5c03612
Disciplines
Medical Education | Medicine and Health Sciences
PubMedID
41179185
Department(s)
USF-LVHN SELECT Program, USF-LVHN SELECT Program Students
Document Type
Article