Posts
Toward Scalable Software Engineering Systems
In his “Bitter Lesson”1, 2025 Turing Award winner Richard Sutton argues that human-designed heuristics help when resources are scarce but hinder progress when resources become abundant. Methods that win at small scale often lose badly at large scale. The transformer architecture and scaling laws have made this pattern hard to ignore.
Yet core SE techniques built on classical algorithms—fuzzing, synthesis, and static analysis—have not fully absorbed these lessons. They rarely assume access to truly large-scale compute.
Expressive Type System is Savior of LLM
Through the RLVR (Reinforcement Learning with Verifiable Reward) methodology, we have been able to create reasoning models with overwhelming performance in domains where automatic verification is possible. However, this is not a panacea. Depending on what rules are used to assign rewards, LLMs can exploit these rules, leading to “reward hacking.”
From a safety perspective, reward hacking manifests as follows: LLMs insert exception handling code literally everywhere in order to generate safe code.
Why Language Models Hallucinate
Put simply, hallucination is a language model’s behavior that creates plausible but unsupported statements. It has been a known weakness since the earliest language models and remains unsolved. Many approaches—such as Retrieval-Augmented Generation (RAG) and reinforcement learning (RL) —reduce the rate but do not eliminate it.
So, why do language models hallucinate? First, some questions cannot be answered with certainty from the information at hand. With limited context, there is no way to guarantee a correct answer.