Hey, I’m Yi Ma, a 4th year PhD candidate of College of Intelligence and Computing in Tianjin University. I’m a member of Professor Jianye Hao’s research group. I have an research interest in offline reinforcement learning and application of RL. Besides, I’m a huge fan of basketball, snowboarding and orienteering. I have published more than 20 papers in top AI conferences.

I’m interested in:

  • Reinforcement Learning
  • Offline Reinforcement Learning
  • Application of Deep Reinforcement Learning

🎓 Education

  • 2020.09 - 2024.06, Tianjin University, PhD
  • 2018.09 - 2020.06, Tianjin University, Master
  • 2014.09 - 2018.06, Tianjin University, Bachelor

📝 Selected Publications

Papers


IJCAI 2024
sym

Kai Zhao, Jianye Hao, Yi Ma, Jinyi Liu, Yan Zheng, Zhaopeng Meng.
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles.
IJCAI 2024. (CCF A)
[link]

ICLR 2024
sym

Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng.
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback.
ICLR 2024. (Top AI Conference)
[link]

NeurIPS 2023
sym

Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng.
Reining Generalization in Offline Reinforcement Learning via Representation Distinction.
NeurIPS 2023. (CCF A)
[link]

CIKM 2023
sym

Hebin Liang, Zibin Dong, Yi Ma, Xiaotian Hao, Yan Zheng, Jianye Hao.
A Hierarchical Imitation Learning-based Decision Framework for Autonomous Driving.
CIKM 2023. (CCF B)
[link]

AAAI 2023
sym

Hebin Liang* , Yi Ma*, Zilin Cao, Tianyang Liu, Fei Ni, Zhigang Li, Jianye Hao.
SplitNet: A Reinforcement Learning based Sequence Splitting Method for the MinMax Multiple Travelling Salesman Problem.
AAAI 2023. (CCF A)
[link]

CAAI AIR 2023
sym

Yi Ma, Chao Wang, Chen Chen, Jinyi Liu, Zhaopeng Meng, Yan Zheng, Jianye Hao.
OSCAR: OOD State Conservative Offline Reinforcement Learning for S equential Decision Making.
CAAI Aritificial Intelligence Research 2023.
[link]

IJCAI 2022
sym

Tong Sang, Hongyao Tang, Yi Ma, Jianye Hao, Yan Zheng, Zhaopeng Meng, Boyan Li, Zhen Wang.
PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations.
IJCAI 2022. (CCF A)
[link]

NeurIPS 2021
sym

Yi Ma, Xiaotian Hao*, Jianye Hao, Jiawen Lu, Xing Liu, Tong Xialiang, Mingxuan Yuan, Zhigang Li, Jie Tang, Zhaopeng Meng.
A hierarchical reinforcement learning based optimization framework for large scale dynamic pickup and delivery problems.
NeurIPS 2021. (CCF A)
[link]

KDD 2021
sym

Fei Ni, Jianye Hao, Jiawen Lu, Xialiang Tong, Mingxuan Yuan, Jiahui Duan, Yi Ma, Kun He
A Multi-Graph Attributed Reinforcement Learning based Optimization Algorithm for Large-scale Hybrid Flow Shop Scheduling Problem.
KDD 2021. (CCF A)
[link]

ICML 2020
sym

Xiaotian Hao*, Zhaoqing Peng*, Yi Ma*, Guan Wang, Junqi Jin, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai.
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising.
ICML 2020. (CCF A)
[link]

Patents


  • CN113850414B. Method for Logistics Scheduling Planning Based on Graph Neural Networks and Reinforcement Learning. (First Inventor, Authorized)
  • CN114130034B. Multi Agent Game AI Design Method Based on Attention Mechanism and Reinforcement Learning. (Fifth Inventor, Authorized)
  • CN113947348A. A Method and Device for Order Allocation. (Second Inventor, Under Review)
  • CN113869489A. Complex Game AI Design Method Based on Hierarchical Deep Reinforcement Learning. (Third Inventor, Under Review)
  • CN113869488A. Reinforcement Learning Method for Game AI Agents in Continuous Discrete Mixed Decision Environments. (Third Inventor, Under Review)
  • CN114169421A. Cooperative Exploration Method in Sparse Reward Environments for Multi Agent Systems Based on Intrinsic Motivation. (Fourth Inventor, Under Review)
  • CN114139681A. Meta Reinforcement Learning Method Based on Contrastive Learning and Mutual Information. (Fourth Inventor, Under Review)

🏅 Competitions and Honors

  • 2022.12, NeurIPS 2022 SMARTS Autonomous Driving Competition. First Prize in both tracks. [link]
  • 2021.06, Huawei 2012 Central Research Institute Innovation Pioneer. President's Award Second Prize.
  • 2017.12, Intel Cup National College Students Software Innovation Competition National Finals Third prize

🏛️ Invited Talks

  • 2023.12, Reining Generalization in Offline Reinforcement Learning via Representation Distinction. @DAI 2023
  • 2022.01, A hierarchical reinforcement learning based optimization framework for large scale dynamic pickup and delivery problems @RLChina

💻 Internships

  • 2020.11-2023.11, Huawei Noah’s Ark Lab, Decision and Reasoning Team.
  • 2020.04-2020.10, Huawei Noah’s Ark Lab, Enterprise Intelligence Team.
  • 2019.07-2019.12, Alibaba, Alimama Target Advertising Team.