Expressive Type System is Savior of LLM
Through the RLVR (Reinforcement Learning with Verifiable Reward) methodology, we have been able to create reasoning models with overwhelming performance in domains where automatic verification is possible. However, this is not a panacea. Depending on what rules are used to assign rewards, LLMs can exploit these rules, leading to “reward hacking.”
From a safety perspective, reward hacking manifests as follows: LLMs insert exception handling code literally everywhere in order to generate safe code. While they have created safe code, it is not in the form we desired. Creating a perfect reward system is a challenge not only for reinforcement learning but for all of humanity (Goodhart’s Law).
To solve this problem, I propose making LLMs use expressive and strong type systems. LLMs insert exception handling everywhere due to uncertainty about error occurrence. Even if they have previously determined the error occurrence possibility, the LLMs’ context is reset with each session. Types can indicate error occurrence possibility, LLMs can confidently remove unnecessary safety code.
During the training process, if unnecessary safety code is inserted, this could be detected through type checkers and give penalty to the model. It is time to move beyond Python to Rust and Lean, where type systems are more expressive and strong.