Professor Liu Jun Shares in Daily Economic News Interview: Higher-Level AI Development May Require Breaking Through Potential Bottlenecks in Current Paradigm-清华大学刘军院士课题组

Recently, at the 2025 Financial Street Forum Annual Conference, Professor Liu Jun, a member of the US National Academy of Sciences, Xinghua Distinguished Chair Professor at Tsinghua University, and Chair of the Department of Statistics and Data Science at Tsinghua University, accepted an on-site interview with a reporter from Daily Economic News (hereinafter referred to as “NBD”).

In the interview, Professor Liu Jun stated that for artificial intelligence (AI) to achieve higher-level development, it may be necessary to break through the potential bottlenecks inherent in the current paradigm where large language models rely on statistical probability for “predicting the next token.” While there are various detailed refinements to this current approach, a more advanced dominant paradigm has yet to be discovered.

Professor Liu Jun has long been engaged in research and has made outstanding contributions in areas such as Bayesian statistical theory, Monte Carlo methods, statistical machine learning, state-space models and time series, and bioinformatics, exerting a profound influence on the fields of big data processing and machine learning. In the interview, he also discussed the development of statistics as a discipline. He pointed out that over the past decades, advancements in biomedical and other large-scale data generation technologies have been driving the continuous progress of statistics as a fundamental discipline.

The “Gibbs conservative string sampling and pointer” proposed by Professor Liu Jun was once one of the most popular algorithms among biologists for discovering intricate patterns in DNA and protein sequences, achieving significant success in understanding gene regulation and protein homology.

Professor Liu Jun Delivering Speech (Source: Photo by Reporter Zhang Shoulin, Daily Economic News)

NBD: Large language models rely on big data and statistical probability, generating linguistic responses by continuously predicting the next word. This differs considerably from the public perception that AI reasons and judges based on semantics. What is your view on this issue?

Liu Jun: To believe that large language models understand semantics is a romantic narrative. The cornerstone of large language models is “Next Token Prediction,” i.e., predicting word by word, without genuinely “understanding” language itself, despite the often-impressive results produced by tools like DeepSeek and ChatGPT. In statistical terminology, “Next Token Prediction” is also known as the “Auto Regressive Model,” which predicts step-by-step based on associations within word (or temporal) sequences. From this perspective, it could potentially become a bottleneck for AI models striving for higher-level development. Therefore, language models might need to consider how to break through this line of thinking in their next steps.

In fact, some are already experimenting with new approaches—instead of predicting word by word, generating text segment by segment, akin to first constructing a sentence framework and then filling in specific words.

Under this approach, during training, each word is treated as a latent code, meaning its position is left blank, and the result is generated through a denoising process. Reports indicate that this method yields promising results, although it is currently difficult to claim it is superior to “Next Token Prediction.”

This holistic, planning-oriented mode of language generation more closely resembles the human process of thinking and expression. Further development of this paradigm may bring new surprises, although its prospects remain uncertain.

NBD: The foundational field of statistics is quite mature today. What unresolved challenges does this fundamental discipline currently face?

Liu Jun: Statistics is an open discipline. In other words, it does not have a fixed set of problems waiting to be solved, nor will its “development be declared complete” upon solving a particular class of problems.

Many problems in statistics originate from practice. For instance, due to widespread interest in issues related to large models, problems concerning high-dimensional data within statistics are receiving more discussion. This exemplifies problem exploration and method development driven by applications.

Looking back at the initial development of statistics, the discipline was primarily driven by research in astronomy/astrology and social demography. Entering the 20th century, statistics advanced further due to developments in genetics, agricultural breeding, industrial experimental design, and more.

Take the British statistician Ronald Fisher as an example; he was also a renowned geneticist. To meet the needs of population genetics research, he proposed the famous probability-based evolutionary model; addressing the needs of agricultural experiments, he proposed methods like randomized Latin square design, as well as theories and methods of statistical inference such as analysis of variance (ANOVA).

For decades, the rapid development of medicine and biology has been driving statistics forward continuously. I myself am also engaged in research in bioinformatics. Taking molecular biology as an example, gene chip information contains indications of gene expression within cells. By analyzing the patterns of inheritance and variation of these genes, one can determine the association between specific variants and diseases, thereby supporting targeted drug development. These processes all require statistics to continually update its own methods to meet evolving needs.

NBD: The public is also quite interested in another application scenario of statistics: stock investment. This is also a process of probabilistic decision-making. From this perspective, can investors with a statistics background perform better in stock investment?

Liu Jun: To my knowledge, investment institutions are indeed willing to hire individuals with a statistics background. However, for personal investment, a solid grasp of statistics does not necessarily guarantee superior personal investment performance. Investment also requires studying various aspects such as macroeconomics and involves extensive training, substantial capital, and significant effort. Therefore, for individuals, they may lack the bandwidth to handle all this, and their capital may not support frequent trading. Overall, it is still the large, leading investment institutions and hedge funds that tend to perform better in investments.

News Source: Daily Economic News

Welcome to Jun Liu's Website

Professor Liu Jun Shares in Daily Economic News Interview: Higher-Level AI Development May Require Breaking Through Potential Bottlenecks in Current Paradigm