Alibaba researchers launch Marco-o1, an LLM with advanced reasoning capabilities.
Games

Alibaba researchers launch Marco-o1, an LLM with advanced reasoning capabilities.


Join our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI coverage. Learn more


The latest release of OpenAI o1 has received a lot of attention for large-scale reasoning models (LRMs) and inspired new models. Aimed at solving complex problems, classical language models often encounter problems. Alibaba researchers build on o1’s success and LRM concept Marco-o1This increases your ability to reason and deal with problems with open-ended solutions where clear standards and quantifiable returns are missing.

OpenAI o1 uses “inference time scaling” to improve the reasoning ability of the model by providing Essentially “time to think” The model uses more processing cycles during inference to generate more tokens and validate responses. This helps improve performance in reasoning tasks. The o1 is famous for its impressive reasoning abilities. This is especially true in tasks with standard answers, such as math, physics, and coding.

However, many applications deal with open-ended problems that lack clear solutions and quantifiable payoffs. “We aim to push the boundaries of LLM even further. It enhances reasoning ability to tackle complex real-world challenges,” Alibaba researchers wrote.

Marco-o1 is a highly optimized version of Alibaba’s Qwen2-7B-Instruct. It combines advanced techniques such as Chain of Thought (CoT) optimization. Search for Monte Carlo trees. (MCTS) and reasoning execution strategies

The researchers trained Marco-o1 on several datasets, including Open-O1 CoT dataset; The Marco-o1 CoT dataset, which is a synthetic dataset created using MCTS, and the Marco-o1 Command dataset, which is a custom on-demand dataset for reasoning tasks.

Marco-o1 uses CoT and MCTS for task-related reasons (source: arXiv)

MCTS is a search algorithm that has been proven effective in complex problem-solving situations. Explore different solutions. Intelligently by repeatedly sampling possibilities, simulating the results, and gradually building a decision tree. It has proven to be very effective in complex AI problems, such as winning a game of Go.

Marco-o1 leverages MCTS to explore multiple reasoning paths while generating response tokens. The model uses the sentiment scores of candidate response tokens to build decision trees and explore different fields. This allows the model to consider a wider range of possibilities. and reach conclusions with more complete and appropriate information. This is especially true in situations with open-ended solutions. The researchers also introduced a flexible reasoning execution strategy that allows tuning the details of the MCTS procedure by determining the number of tokens generated at each node in the tree. This creates a trade-off between accuracy and computational cost. This gives users the flexibility to balance performance and efficiency.

Another important innovation in the Marco-o1 is the introduction of a ricochet mechanism. During the reasoning process The model will periodically remind itself with the phrase “Wait a minute!” Maybe I made a mistake! I need to think from scratch.” This causes the model to reevaluate its reasoning process. Identify possible errors and improve the thinking process

“This approach allows the model to act as its own critic. It identifies potential errors in reasoning,” the researchers wrote. “By clearly encouraging the model to question its initial conclusions, We encourage the model to re-express and refine its thought process.”

To evaluate the effectiveness of Marco-o1, the researchers conducted several experiments. Including the MGSM benchmark, a multilingual elementary school mathematics problem dataset, Marco-o1 significantly outperforms the baseline Qwen2-7B model. This is especially true when MCTS components are adjusted for single token components.

Marco-o1 results
Marco-o1 compared to different versions of the base model (Source: arXiv)

However, the main objective of Marco-o1 is to address the challenge of reasoning in open-ended situations. To this end, the researchers tested the colloquial expression and slang translation model. It is a task of understanding the subtle nuances of language, culture and context. Experiments have shown that Marco-o1 can capture and translate these expressions more efficiently than traditional translation tools, for example translation models. Expressions spoken in Chinese correctly which directly translates to “These shoes feel like I’m stepping on shit” is the English equivalent of “This shoe feels like I’m stepping on shit.” “This shoe has a comfortable sole.” The model’s reasoning chain shows how the model evaluates different possible meanings and arrives at the correct translation.

This paradigm can prove useful for tasks such as product design and strategy. This requires in-depth understanding and context. and there are no clearly defined standards and units of measurement.

Marco-o1's translation
Example of a reasoning chain for translation (source: arXiv)

New Wave Reasoning Models

Since the launch of o1, AI labs have been rushing to release reasoning models. Last week, DeepSeek, a Chinese AI lab, launched R1-Lite-Preview Competitor o1, which is currently only available through the company’s online chat interface, R1-Lite-Preview reportedly beats o1 in several key benchmarks.

The open source community is still keeping pace with the private model market. It releases models and datasets that take advantage of inferential time scale laws. Alibaba team launches Marco-o1 On the face hugged along with Partial reasoning dataset that researchers can use to train their own reasoning models Another recently launched model is the LLaVA-o1, which was developed by researchers from several universities in China. It brings the reasoning and timing inference paradigm to the open source visual language model (VLM).

The release of these models comes amid uncertainty about the future of model scaling laws. Various reports indicate that returns from training large models are declining and may even collapse. But what is certain is We are just beginning to explore the possibilities of extrapolating time scales.



Source link

You may also like...