Towards Better Understanding of Program-of-Thought Reasoning
This Research by SCBX Group and Partners shows that Program-of-Thought (PoT) improves multilingual reasoning in LLMs by separating reasoning from execution. After fine-tuning, PoT outperforms Chain-of-Thought (CoT), and better reasoning leads to more accurate answers.

The Big Problem: AI Struggles with Foreign Languages Today’s AI models are very smart, but they often stumble when asked to do complex, multi-step math problems in languages other than English. Traditionally, AI tries to solve problems by reasoning out the steps in normal sentences while doing the math in its “head” at the same time. When doing this in an unfamiliar language, the AI gets confused and makes mistakes.
The Clever Fix: Teaching AI to Code To solve this, researchers tested a method called “Program-of-Thought.” Instead of forcing the AI to calculate the math itself, this method teaches the AI to simply write a computer program (specifically in Python) that maps out the logical steps of the problem.
Once the AI writes the code, a standard computer program acts as a calculator to run the code and find the final answer. By letting a calculator handle the actual math, the AI only has to focus on the logic, which makes the language barrier much less of an issue.

The “Secret Sauce”: How to Train the AI When humans write computer code, they often leave little explanatory text notes inside it called “comments.” The researchers found two interesting rules for using these comments to train AI:
- For AI trained only in English: If the AI is going to be tested on brand-new languages, it is actually better to completely remove the comments. Without the distraction of English notes, the AI is better at generalizing and solving problems in languages it hasn’t seen before.
- For AI trained in many languages: If the AI is being trained on multiple languages from the start, translating those comments into the specific language it is working in gives the absolute best results.
The “Quality Check” Trick The researchers also made a big discovery: if the AI writes high-quality code, the final math answer is almost always right.
Using a special automated grading tool, they came up with a clever strategy to boost the AI’s accuracy. They asked the AI to write up to 40 different coding solutions for a single math problem. The system then graded all of those codes and picked the answer that came from the highest-scoring code. This simple quality-check trick caused a massive jump in accuracy—in one test, the AI’s success rate skyrocketed from 31.6% to 56.6%.
Why It Matters Overall, this research proves that teaching AI to act like a computer programmer is a much better way to help it solve complex problems in foreign languages. This is a huge step toward making advanced AI tools useful for people all over the world, no matter what language they speak.


