Posts by Collection

Portfolio item number 1

Published: March 24, 2026

Short description of portfolio item number 1

Portfolio item number 2

Published: March 24, 2026

Short description of portfolio item number 2

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Published in Findings of EMNLP 2025, 2025

LLMEval-Med is a physician-validated clinical benchmark built from real-world electronic health records and expert-designed scenarios. It targets the weaknesses of existing medical LLM evaluations by moving beyond exam-style questions toward realistic clinical reasoning and checklist-based expert assessment.

Details Paper BibTeX

LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

Published in arXiv preprint, 2025

LLMEval-Fair proposes a dynamic evaluation framework that samples unseen test sets from a large question bank, combines contamination-resistant curation with anti-cheating design, and studies almost 50 frontier models longitudinally to produce a more reliable picture of progress than static leaderboards.

Details Paper

OpenNovelty: An Open-domain Benchmark for Evaluating the Open-ended Novelty of Language Models

Published in arXiv preprint, 2026

OpenNovelty studies whether language models can judge the novelty of open-ended ideas rather than only solve fixed-answer tasks. It introduces an open-domain benchmark for comparing LLM judgments of novelty across research-oriented scenarios, aiming to better understand how models support scientific creativity and evaluation.

Details Paper

Talk 1 on Relevant Topic in Your Field

Published: March 01, 2012

This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!

Tutorial 1 on Relevant Topic in Your Field

Published: March 01, 2013

More information here

Talk 2 on Relevant Topic in Your Field

Published: February 01, 2014

More information here

Conference Proceeding talk 3 on Relevant Topic in Your Field

Published: March 01, 2014

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015