Decoding Sentiment with Large Language Models
Comparing Prompting Strategies Across Hard, Soft, and Subjective Label Scenarios
More Info
expand_more
Abstract
This study evaluates the performance of different sentiment analysis methods in the context of public deliberation, focusing on hard-, soft-, and subjective-label scenarios to answer the research question: ``can a Large Language Model detect subjective sentiment of statements within the context of public deliberation?''. If the answer to this question is affirmative, that is a strong indicator that, with the help of longitudinal studies, sentiment analysis with large language models (LLMs) may be implemented to scale public deliberations by providing support for moderators in such discussions. To answer this question, four prompting methods were tested: zero-shot, few-shot, chain-of-thought (CoT) zero-shot, and CoT few-shot using a Frisian dataset of 50 statements annotated by 5 annotators. The findings indicate that the CoT few-shot method significantly outperforms other methods in all scenarios, that soft-labels outperform their hard equivalent, that the underlying data must be balanced for high performing models, and that capturing the perspective of a specific annotator requires further research. Our study suggests that LLMs may perform best under the supervision, or with the collaboration of a human, due to the multi-faced nature of sentiment.