PICO Portal Presents at ISPOR 2026

  • Post category:News

PICO Portal is proud to present its latest research at ISPOR 2026 in Philadelphia. Stop by our poster session (details below) or book a time to meet with us 1:1.

Poster Session 1 – Monday, May 18 10:30am – 1:30pm

AI-Assisted Risk Of Bias Assessment In Observational Studies: Isolating Judgment Using A Passage-Based Validation Approach (MSR29)
AUTHORS: Eitan Agai, Karen A. Robinson, Alon Agai

OBJECTIVES: To evaluate whether a large language model (LLM) can reproduce expert risk of bias (RoB) judgments when the evidence is held constant, by isolating and validating the assessment step rather than evidence identification.

METHODS: We conducted a passage-based validation study using 128 observational studies from a systematic review of per- and polyfluoroalkyl substances (PFAS) and health outcomes. Two experienced reviewers assessed RoB across nine Navigation Guide domains and identified the text passages (evidence) supporting each judgment. These human-selected passages, together with the corresponding Navigation Guide domain questions and guidance, were provided to an LLM. The model was not tasked with locating evidence or reviewing full-text articles. For each passage -domain pair, the LLM generated a structured RoB rating on a five-level scale. Model outputs were compared with human-adjudicated ratings using exact agreement, acceptable agreement, percent agreement, and weighted Cohen’s kappa, overall and by domain.

RESULTS: Agreement between LLM-generated and human-adjudicated RoB ratings varied by domain. Concordance was highest in domains relying on explicit reporting, with agreement ranging from 95.1% to 100%, and lower in domains requiring contextual judgement, with agreement ranging from 76.3 to 89%. Most discrepancies reflected partial or acceptable agreement rather than opposing assessments, indicating partial alignment in judgment when evidence was shared.

CONCLUSIONS: Separating evidence identification from assessment enables targeted evaluation of AI judgment quality and provides a clearer foundation for responsible integration of AI into evidence synthesis. When restricted to human-curated evidence passages within observational studies, LLMs demonstrated reasoning and assessment similar to that of a human RoB assessor.

Curious about how to enhance your organization’s review workflows? Click here to book a time with us during the conference.