Hannah Cyberey, Yangfeng Ji, David Evans. Unsupervised Concept Vector Extraction for Bias Control in LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2025.
Hannah Cyberey, David Evans. Steering the CensorShip: Uncovering Representation Vectors for LLM “Thought” Control. In Conference on Language Modeling (COLM). Oct 2025.
Hannah Cyberey, Yangfeng Ji, David Evans. Do Prevalent Bias Metrics Capture Allocational Harms from LLMs? In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP. May 2025.
Hannah Cyberey, Yangfeng Ji, David Evans. Addressing Both Statistical and Causal Gender Fairness in NLP Models. In Findings of the Association for Computational Linguistics: NAACL 2024. Jun 2024.
Hannah Cyberey, Yangfeng Ji, David Evans. Balanced Adversarial Training: Balancing Tradeoffs Between Oversensitivity and Undersensitivity in NLP Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). Oct 2022.
Hannah Cyberey, Yangfeng Ji, David Evans. Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory. In Findings of the Association for Computational Linguistics: EMNLP 2020. Nov 2020.
Hannah Cyberey, Yangfeng Ji, David Evans. Pointwise Paraphrase Appraisal Is Potentially Problematic. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Jul 2020.