Hey, I’m Yi Ma, a 4th year PhD candidate of College of Intelligence and Computing in Tianjin University. I’m a member of Professor Jianye Hao’s research group. I have an research interest in offline reinforcement learning and application of RL. Besides, I’m a huge fan of basketball, snowboarding and orienteering. I have published more than 20 papers in top AI conferences.

I’m interested in:

  • Reinforcement Learning
  • Offline Reinforcement Learning
  • Application of Deep Reinforcement Learning

🎓 Education

  • 2020.09 - 2024.06, Tianjin University, PhD
  • 2018.09 - 2020.06, Tianjin University, Master
  • 2014.09 - 2018.06, Tianjin University, Bachelor

📝 Selected Publications


PS: Authors with equal contribution are marked by *.

ICML 2024

Yi Ma, Jianye Hao, Hebin Liang, Chenjun Xiao.
Rethinking Decision Transformer via Hierarchical Reinforcement Learning.
ICML 2024. (CCF A)

ICML 2024

Jiashun Liu, Jianye HAO, Yi Ma, Shuyin Xia.
Imagine Big from Small: Unlock the Cognitive Generalization of Deep Reinforcement Learning from Simple Scenarios.
ICML 2024. (CCF A)

IJCAI 2024

Kai Zhao, Jianye Hao, Yi Ma, Jinyi Liu, Yan Zheng, Zhaopeng Meng.
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles.
IJCAI 2024. (CCF A)

ICLR 2024

Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng.
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback.
ICLR 2024. (Top AI Conference)

NeurIPS 2023

Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng.
Reining Generalization in Offline Reinforcement Learning via Representation Distinction.
NeurIPS 2023. (CCF A)

CIKM 2023

Hebin Liang, Zibin Dong, Yi Ma, Xiaotian Hao, Yan Zheng, Jianye Hao.
A Hierarchical Imitation Learning-based Decision Framework for Autonomous Driving.
CIKM 2023. (CCF B)

AAAI 2023

Hebin Liang* , Yi Ma*, Zilin Cao, Tianyang Liu, Fei Ni, Zhigang Li, Jianye Hao.
SplitNet: A Reinforcement Learning based Sequence Splitting Method for the MinMax Multiple Travelling Salesman Problem.
AAAI 2023. (CCF A)


Yi Ma, Chao Wang, Chen Chen, Jinyi Liu, Zhaopeng Meng, Yan Zheng, Jianye Hao.
OSCAR: OOD State Conservative Offline Reinforcement Learning for S equential Decision Making.
CAAI Aritificial Intelligence Research 2023.

IJCAI 2022

Tong Sang, Hongyao Tang, Yi Ma, Jianye Hao, Yan Zheng, Zhaopeng Meng, Boyan Li, Zhen Wang.
PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations.
IJCAI 2022. (CCF A)

NeurIPS 2021

Yi Ma, Xiaotian Hao*, Jianye Hao, Jiawen Lu, Xing Liu, Tong Xialiang, Mingxuan Yuan, Zhigang Li, Jie Tang, Zhaopeng Meng.
A hierarchical reinforcement learning based optimization framework for large scale dynamic pickup and delivery problems.
NeurIPS 2021. (CCF A)

KDD 2021

Fei Ni, Jianye Hao, Jiawen Lu, Xialiang Tong, Mingxuan Yuan, Jiahui Duan, Yi Ma, Kun He
A Multi-Graph Attributed Reinforcement Learning based Optimization Algorithm for Large-scale Hybrid Flow Shop Scheduling Problem.
KDD 2021. (CCF A)

ICML 2020

Xiaotian Hao*, Zhaoqing Peng*, Yi Ma*, Guan Wang, Junqi Jin, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai.
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising.
ICML 2020. (CCF A)


  • CN113850414B. Method for Logistics Scheduling Planning Based on Graph Neural Networks and Reinforcement Learning. (First Inventor, Authorized)
  • CN114130034B. Multi Agent Game AI Design Method Based on Attention Mechanism and Reinforcement Learning. (Fifth Inventor, Authorized)
  • CN113947348A. A Method and Device for Order Allocation. (Second Inventor, Under Review)
  • CN113869489A. Complex Game AI Design Method Based on Hierarchical Deep Reinforcement Learning. (Third Inventor, Under Review)
  • CN113869488A. Reinforcement Learning Method for Game AI Agents in Continuous Discrete Mixed Decision Environments. (Third Inventor, Under Review)
  • CN114169421A. Cooperative Exploration Method in Sparse Reward Environments for Multi Agent Systems Based on Intrinsic Motivation. (Fourth Inventor, Under Review)
  • CN114139681A. Meta Reinforcement Learning Method Based on Contrastive Learning and Mutual Information. (Fourth Inventor, Under Review)

🏅 Competitions and Honors

  • 2022.12, NeurIPS 2022 SMARTS Autonomous Driving Competition. First Prize in both tracks. [link]
  • 2021.06, Huawei 2012 Central Research Institute Innovation Pioneer. President's Award Second Prize.
  • 2017.12, Intel Cup National College Students Software Innovation Competition National Finals Third prize

🏛️ Invited Talks

  • 2023.12, Reining Generalization in Offline Reinforcement Learning via Representation Distinction. @DAI 2023
  • 2022.01, A hierarchical reinforcement learning based optimization framework for large scale dynamic pickup and delivery problems @RLChina

💻 Internships

  • 2020.11-2023.11, Huawei Noah’s Ark Lab, Decision and Reasoning Team. Supervised by Chenjun Xiao, Dong Li, Chen Chen and Chao Wang.
  • 2020.04-2020.10, Huawei Noah’s Ark Lab, Enterprise Intelligence Team. Supervised by Jiawen Lu.
  • 2019.07-2019.12, Alibaba, Alimama Target Advertising Team. Supervised by Junqi Jin.