TaxoCite: Hierarchical Topic-Enriched Citation Intent Classification for Scientific Literature
March 4, 2026 11:00 am (Central Time)
Abstract
Citation intent classification aims to identify how prior work is used within scientific papers. However, existing task formulations rely on coarse, flat, and topic-agnostic label spaces that fail to capture the fine-grained and structured roles citations play in practice. To this end, we introduce hierarchical topic-enriched citation intent classification, a new task setting that models citation usage through hierarchical classification over two complementary taxonomies: an intent taxonomy capturing general functional roles, and a topic-specific taxonomy capturing domain semantics. To support this task, we construct HiToBench, a human-annotated benchmark of 5,160 citation instances from 100 papers across computer science, geospatial science, and chemistry, labeled by domain experts under principled guidelines. We further propose TaxoCite, a test-time scaling LLM pipeline, which integrates topic–intent interaction, bidirectional consistency, and majority voting refinement. Experiments demonstrate that the proposed method consistently outperforms strong LLM baselines. Overall, this work advances citation intent analysis from flat classification toward a fine-grained, structured, and topic-aware understanding of scientific citation usage.
Speakers
Runchu Tian
University of Illinois Urbana-Champaign
Patrick Xu
University of Illinois Urbana-Champaign