Home  | Publications | DDS+24

Informing Climate Risk Analysis Using Textual Information - A Research Agenda

MCML Authors

Abstract

We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO2 emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.

inproceedings


ClimateNLP @ACL 2024

1st Workshop on Natural Language Processing Meets Climate Change at the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024.

Authors

A. Dimmelmeier • H. Doll • M. Schierholz • E. Kormanyos • M. Fehr • B. MaJ. BeckA. FraserF. Kreuter

Links

DOI

Research Areas

 B2 | Natural Language Processing

 C4 | Computational Social Sciences

BibTeXKey: DDS+24

Back to Top