Mohammad Taufeeque

prof_pic.jpg
Research Engineer FAR.AI

I am a research engineer at FAR.AI. My current research interests are scalable interpretability, post-training interventions that robustly preserve values like honesty across contexts and personas, and improving the introspective awareness of LLMs.

At FAR, my prior work has included scalable interpretability via sparse codebook features, mechanistic analysis of goals and planning in recurrent agents, red-teaming frontier LLMs, and preserving honesty during RL training with deception probes.

I graduated from IIT Bombay with a B.Tech in Computer Science, where my bachelor’s thesis with Prof. Shivaram Kalyanakrishnan won the NeurIPS 2021 Reconnaissance Blind Chess competition. Previously, I interned at Microsoft Research with Prof. Sunita Sarawagi and Dr. Sriram Rajamani, and at TU Braunschweig with Prof. Thomas Deserno.


Publications

  1. The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
    Mohammad Taufeeque, Stefan Heimersheim, Adam Gleave, and Chris Cundy
    In Forty-third International Conference on Machine Learning, 2026
    Spotlight
  2. Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
    In The Fourteenth International Conference on Learning Representations, 2026
    Also appeared as a Spotlight at the Mechanistic Interpretability Workshop, NeurIPS 2025
  3. Planning in a recurrent neural network that plays Sokoban
    arXiv, 2024
    Mechanistic Interpretability Workshop, ICML 2024
  4. Exploiting Novel GPT-4 APIs
    arXiv, 2023
  5. Codebook Features: Sparse and Discrete Interpretability for Neural Networks
    Alex Tamkin, Mohammad Taufeeque, and Noah Goodman
    In Forty-first International Conference on Machine Learning, 2024
  6. imitation: Clean Imitation Learning Implementations
    arXiv, 2022
  7. Fianchetto: Speed, Belief, Guile, Caution to Win at Reconnaissance Blind Chess
    Mohammad Taufeeque*, Nitish Tongia*, and Shivaram Kalyanakrishnan
    Bachelor’s Thesis, 2022
  8. The Second NeurIPS Tournament of Reconnaissance Blind Chess
    In Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2022
  9. Multi-camera, multi-person, and real-time fall detection using long short term memory
    Mohammad Taufeeque*, Samad Koita*, Nicolai Spicher, and Thomas M. Deserno
    In Medical Imaging 2021: Imaging Informatics for Healthcare, Research, and Applications, 2021
  10. Randomized POMDP Planning Algorithms
    Mohammad Taufeeque and Shivaram Kalyanakrishnan
    Technical Report, 2021