加载头像
论文笔记
2024
【论文笔记】xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
【论文笔记】xGen-MM (BLIP-3): A Family of Open Large Multimodal Models51
【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs52
【论文笔记】MLSLT: Towards Multilingual Sign Language Translation
【论文笔记】MLSLT: Towards Multilingual Sign Language Translation53
【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs54
【论文笔记】VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
【论文笔记】VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval55
【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding56
【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation
【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation57
【论文笔记】Fine-tuned CLIP Models are Efficient Video Learners
【论文笔记】Fine-tuned CLIP Models are Efficient Video Learners58
【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation
【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation59
【论文笔记】CLIP4Clip An empirical study of CLIP for end to end video clip retrieval and captioning
【论文笔记】CLIP4Clip An empirical study of CLIP for end to end video clip retrieval and captioning60
引用到评论
随便逛逛博客分类文章标签
复制地址关闭热评深色模式轉為繁體