Sikun Xu

I am a Ph.D. candidate at Olin Business School, Washington University in St. Louis. I’m fortunate to be advised by Prof. Dennis Zhang and Prof. Raphael Thomadsen. I study how firms can make reliable decisions using modern machine learning and AI systems when the available data is noisy and high-dimensional.

I’m on the 2026-2027 job market!

Contacts

  • sikun [at] wustl [dot] edu
  • xusikun96 [at] gmail [dot] com

Education

  • Olin Business School, Washington University in St. Louis (2021-now)
    • Ph.D. Candidate
    • Dissertation Title: Data-Driven Business Decision-Making with Causal Inference and Machine Learning
  • Columbia University in the City of New York (2019-2020)
    • M.S. in Operations Research
    • Data Science Institute Scholar
  • Shanghai Jiao Tong University (2015-2019)
    • B.S. in Industrial Engineering

Job Market Papers

  1. The Winner's Curse in Data-Driven Decision-Making: Evidence and Solutions
    Submitted. Accepted to SICS 2025.
    SSRN
    Abstract

    Data-driven decision making involves estimating the value of each potential option and selecting the one with the highest estimated efficacy. This approach underpins a wide array of modern marketing, operations, and AI applications, including A/B testing, advertising and bidding, pricing, and personalized targeting. However, several papers have shown that the estimated effectiveness of the chosen options will be systematically over-optimistic (Smith and Winkler 2006, Efron 2011, Andrews et al. 2024), even when the estimated outcomes are themselves unbiased and efficient. Using simulations calibrated to realistic parameter values from recent marketing studies, we first demonstrate that the magnitude of the winner’s curse is often high in relevant marketing contexts, and that the severity of the winner’s curse depends on the true performance difference between options relative to the level of noise in the data, the number of alternatives under consideration, and the number of observations per tested condition. We further show that using machine learning methods to evaluate what treatment to give to individual consumers can lead to extremely high levels of winner’s curse, especially if the machine learning functional form is very flexible. We propose a correction method based on a non-continuous bootstrap, and benchmark our method against several existing proposed solutions across many common marketing scenarios. We demonstrate that our bootstrap approach generally performs well, and usually outperforms the solutions that have been previously proposed in the literature.

  2. SOTA or Luck? The Winner’s Curse in LLM Leaderboards
    Submitted.
    Abstract

    Public LLM leaderboards are now central to how models are evaluated, compared, and publicized, yet we show that the reported performance of top-ranked models can be systematically overstated. The reason is a leaderboard winner’s curse (WC): benchmark scores are noisy averages over finite task sets, so ranking models in the decreasing order of scores rewards sampling luck alongside genuine ability. This overstatement is large enough to make the apparent winner statistically fragile: across four out of the five SWE-bench variants we study, the published rank 1 model has less than a 50% bootstrap probability of remaining rank 1 under task resampling. We develop an $m$-out-of-$N$ task bootstrap for iid benchmark tasks while allowing arbitrary within-task correlation among models. We prove that the worst-case rank-induced bias (WC as a special case for the first rank) is $\Theta(1/\sqrt{N})$ in the number of tasks $N$, and that our correction reduces it to $o(1/\sqrt{N})$. We also introduce a rank probability matrix that replaces a single deterministic ranking with a distribution over plausible rankings. Empirically, rank-1 inflation reaches 1–3 percentage points across eight benchmark settings spanning code generation and agentic customer service. On SWE-bench Verified, selection bias explains 47% of the winner’s apparent top-five lead, and the top 28 models are statistically indistinguishable at the 5% level. Our results suggest that benchmark designers should either enlarge the task pools substantially to suppress the WC or report bias-corrected scores and ranking uncertainty alongside raw rankings.

Working Papers

  1. A Causal Approach to Representation Learning for Unstructured Data
    Major revision at Management Science. Accepted to 19th Annual Bass FORMS Conference (2025).
    SSRN
    Abstract

    The increasing availability of unstructured data (e.g., images) in business and economics research has created new opportunities to control for confounders. A common approach is embedding-then-inference, where unstructured data is compressed into low-dimensional embeddings and incorporated into causal models. However, we show that this method can introduce significant bias because representation learning models optimized for reconstruction may miss relevant confounders. To address this, we propose causal embeddings, which explicitly align the objective of representation learning with the causal task by jointly predicting both treatment and outcome variables. This approach captures confounding information while maintaining low-dimensional efficiency and accommodates various embedding methods, including fine-tuned pretrained models. Simulations demonstrate that causal embeddings outperform both embedding-then-inference and direct adjustment with double machine learning (DML) in subsequent causal inference tasks. A real-world application further highlights the practical importance of properly accounting for unstructured data in causal models.

Conference Proceedings

  1. Verifying Global Optimality of Candidate Solutions to Polynomial Optimization Problems using a Determinant Relaxation Hierarchy.
    60th IEEE Conference on Decision and Control (2021).
    IEEE
    Abstract

    We propose an approach for verifying that a given feasible point for a polynomial optimization problem is globally optimal. The approach relies on the Lasserre hierarchy and the result of Lasserre regarding the importance of the convexity of the feasible set as opposed to that of the individual constraints. By focusing solely on certifying global optimality and relaxing the Lasserre hierarchy using necessary conditions for positive semidefiniteness based on matrix determinants, the proposed method is implementable as a computationally tractable linear program. We demonstrate this method via application to several instances of polynomial optimization, including the optimal power flow problem used to operate electric power systems.

Work in Progress

  1. Policy Learning with Noncompliant AI Agents
  2. Exploration Without Noise: Quality-Gated Learning in Platform Recommendations
  3. Generative Learning-to-Rank in Recommendation Systems

Conference Presentations

  • The Winner’s Curse in Data-Driven Decision-Making: Evidence and Solutions
    • 2025 INFORMS Annual Meeting (Atlanta)
    • 2025 INFORMS Marketing Science Conference (Washington, D.C.)
    • 2024 Conference on Artificial Intelligence, Machine Learning, and Business Analytics (Yale)
  • A Causal Approach to Representation Learning for Unstructured Data
    • 2025 ISMS Marketing Science Conference (Washington, D.C.)
    • 2024 INFORMS Annual Meeting (Seattle, Sesssion Chair)
    • 2024 POMS Annual Conference (Minneapolis)
    • 2023 INFORMS Annual Meeting (Phoenix)
  • Data-driven security selection for wealth management
    • 2023 POMS Annual Conference (Orlando)
    • 2022 INFORMS Annual Meeting (Indianapolis)
  • Verifying global optimality of candidate solutions to polynomial optimization problems using a determinant relaxation hierarchy [slide]
    • 2021 INFORMS Annual Meeting (Virtual)

Teaching

Guest Lecturer

  • Washington University in St. Louis (MGT680E, 2024 Fall)
  • Columbia University (IEOR4721, 2022 Spring)
  • Columbia University (IEOR4721, 2021 Summer)

Teaching Assistants

Columbia University in the City of New York

  • IEOR4742 Deep Learning; FL2020
  • IEOR4525 Machine Learning; SP2020

Washington University in St. Louis

  • SCOT519E Revenue Management; FL2022
  • SCOT5704 Operations Management; FL2022
  • SCOT500D Project Management; FL2022, SP2023
  • SCOT500M Supply Chain Analytics: Stochastic Models; SP2023, SP2024
  • SCOT400D Supply Chain Analytics; SP2023
  • SCOT558 Advanced Operations Strategy; FL2023
  • SCOT356 Operations and Manufacturing Management; FL2024
  • MGT680E AI & Machine Learning Business Applications; FL2024

Academic Services

  • Session Chair at INFORMS Annual Meeting (2024)
  • Reviewer for Journal of Investment Strategies