About

I am a Software Engineering student at Fudan University. My current research focuses on trustworthy evaluation of large language models, medical and real-world benchmarks, and scientific-intelligence systems for evaluating novelty and reasoning quality.

I am especially interested in evaluation pipelines that are more robust to contamination, closer to real use cases, and better aligned with the kinds of claims we actually make about modern models.

Selected Publications

View All

Current Focus

Robust evaluation Contamination resistance Medical NLP Expert validation Novelty assessment Scientific review