标签: 多模态 | 小嗷犬

多模态

2024

【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs31

大模型论文笔记多模态

2024-10-24

【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs32

大模型论文笔记多模态

2024-10-20

【论文笔记】VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

【论文笔记】VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval33

论文笔记多模态

2024-10-20

【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding34

大模型论文笔记多模态

2024-10-17

【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation

【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation35

大模型论文笔记手语翻译多模态

2024-10-17

【论文笔记】Fine-tuned CLIP Models are Efficient Video Learners

【论文笔记】Fine-tuned CLIP Models are Efficient Video Learners36

论文笔记多模态

2024-10-14

【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation

【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation37

大模型论文笔记手语翻译多模态

2024-10-11

【论文笔记】CLIP4Clip An empirical study of CLIP for end to end video clip retrieval and captioning

【论文笔记】CLIP4Clip An empirical study of CLIP for end to end video clip retrieval and captioning38

论文笔记多模态

2024-10-10

【论文笔记】VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

【论文笔记】VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs39

大模型论文笔记多模态

2024-10-08

【论文笔记】Flamingo: a Visual Language Model for Few-Shot Learning

【论文笔记】Flamingo: a Visual Language Model for Few-Shot Learning40

大模型论文笔记多模态

2024-09-30