9 Types of Multimodal Chain-of-Thought (MCoT)

AI Tools posted 3w ago dongdong
12 0

9 MCoT (Multimodal Chain-of-Thought) Methods ↓ Save it for now.

  • KAM-CoT: Knowledge-Augmented Multimodal Chain-of-Thought Reasoning (2401.12863). This lightweight framework combines CoT prompting with knowledge graphs (KGs), achieving an accuracy of 93.87%.
    Access: https://huggingface.co/papers/2401.12863
  • Multimodal Visualization-of-Thought (MVoT): Imagination in Spatial Reasoning – Thought Multimodal Visualization (2501.07542). Prompt the model to generate visual reasoning trajectories and use token difference loss to improve visual quality.
    Access:https://huggingface.co/papers/2501.07542
  • Compositional CoT (CCoT)
    Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076), which utilizes scene graphs (SG) generated by the LMM itself to enhance performance on compositional and general multimodal benchmarks.
    Access:https://huggingface.co/papers/2311.17076
  • URSA: Understanding and Verifying Reasoning Chains in Multimodal Mathematics (2501.04686). It introduces System 2-style thinking into multimodal mathematical reasoning and employs a 3-module Chain-of-Thought (CoT) data synthesis process, including CoT distillation, trajectory format rewriting, and format unification.
    Access:https://huggingface.co/papers/2501.04686
  • MM-Verify: Enhancing Multimodal Reasoning through Chain-of-Thought Verification (2502.13383). It introduces a verification mechanism of MM-Verifier and MM-Reasoner, achieving the synthesis of high-quality Chain-of-Thought (CoT) data for multimodal reasoning.
    Access:https://huggingface.co/papers/2502.13383
  • Duty-Distinct CoT (DDCoT)
    DDCoT: Chain-of-Thought Prompting for Multimodal Reasoning with Distinct Responsibilities in Language Models and Vision Models (2310.16436). It divides the reasoning responsibilities between the language model (LM) and the vision model, integrating visual recognition capabilities into the joint reasoning process.
    Access:https://huggingface.co/papers/2310.16436
  • Multimodal-CoT: Chain-of-Thought Reasoning in Multimodal Large Language Models (2302.00923). A two-stage framework separates theory generation from answer prediction, enabling the model to reason more effectively using multimodal inputs.
    Access:https://huggingface.co/papers/2302.00923
  • Graph-of-Thought (GoT): Surpassing Chain-of-Thought, Effective Thinking Graph Reasoning in Large Language Models (2305.16582).
    This two-stage framework models reasoning as an interconnected graph of ideas, thereby improving performance on both text-only and multimodal tasks.
    Access:https://huggingface.co/papers/2305.16582
  • Hypergraph-of-Thought (HoT), the Thought Hypergraph, simulates high-order multi-hop reasoning using text and visual hypergraphs with cross-modal co-attention.
    Access:https://huggingface.co/papers/2308.06207
© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...