← กลับหน้าแรก · 2026-04-09

# 🎞️ DeltaWorld — 1 frame = 1 token (ย่อ video 1024×) CVPR 2026

# 🎞️ DeltaWorld — 1 frame = 1 token (ย่อ video 1024×) CVPR 2026

**Source:** ? —

---

## Facebook Post (public, ตัด 🏛️ ออกแล้ว)

🎞️ DeltaWorld — ย่อวิดีโอ 1,024 เท่า แล้วยังทำ world modeling ได้

Paper CVPR 2026 — DeltaTok tokenizer encode frame-to-frame differences เป็น 1 delta token ต่อ frame

💡 หลักการ:
เดิม: video = 3D spatio-temporal (H × W × T) → token มหาศาล
ใหม่: delta token = เฉพาะความต่างระหว่าง frame → 1D temporal sequence

📊 ตัวเลข:
• 1,024× token reduction ที่ frame 512×512
• 3D representation → 1D sequence
• Multi-hypothesis training: generate diverse futures ใน single forward pass

🎯 Application:
• World modeling สำหรับ forecasting
• Agent planning ที่ต้องคาดการณ์อนาคต
• Video generation efficient
• Robotics, autonomous driving, game AI

🛠️ ทำอะไรได้ต่อ:
1. Project page: deltatok.github.io
2. Code + weights available (per abstract)
3. Paper: arxiv.org/abs/2604.04913
4. Train ได้เองด้วย workflow ที่ reproduce ได้
5. Accepted CVPR 2026 — peer reviewed

💡 Insight ที่สำคัญ:
Efficient representation = กุญแจของ video AI — ถ้าย่อได้ 1024× ขนาดนี้ model ที่ตอนนี้ต้อง A100/H100 จะรันได้บน consumer GPU ใน 1-2 ปี

📄 arxiv.org/abs/2604.04913

#CVPR2026 #Video #Research #Efficient #PowerBoltAI

แชร์: