《DeepSeek-VL: Towards Real-World Vision-Language Understanding》阅读解析

DeepSeek-VL: Towards Real-World Vision-Language Understanding原文链接:

https://arxiv.org/pdf/2403.05525

主要贡献

高分辨率视觉编码:1024 x 1024分辨率

三阶段训练方式

模态热身策略

主要架构

主要架构分为三部分:

A hybrid vision encoder, a vision adaptor, and a language model.

hybrid vision encoder

采用SigLIP作为视觉编码器来提取视觉输入的高级特征表示。然而,一个单独的SigLIP编码器很难解决现实世界的问题,受模糊的编码影响,导致视觉上不同的图像被编码为相似,CLIP家族受其相对较低的分辨率输入的限制(例如224 x 224,336 x 336,384 x 384,512 x 512),这阻碍了他们处理任务的能力,该任务需要更详细的低级别功能,例如密集的OCR和视觉接地任务。

为了处理高分辨率的低级特征,利用SAM-B处理1024 x 1024的高分辨率图像输入,还保留了具有低分辨率384 x 384图像输入的Siglip-L视觉编码器,因此,混合视觉编码器结合了SAM-B和Siglip-L编码器,有效地编码了高分辨率1024 x 1024图像,同时保留语义和详细信息。

Vision-Language Adaptor

使用两层混合MLP来桥接视觉编码器和LLM,最初,不同的单层MLP用于分别处理高分辨率特征和低分辨率功能。随后,这些特征沿其尺寸连接,然后通过另一层MLP转换为LLM的输入空间。

Language Model

语言模型建立在DeepSeek LLM的基础之上,采用Pre-Norm结构(即在每一层的输入之前进行归一化操作,而不是在输出之后进行归一化(Post-Norm)。Pre-Norm 结构在近年来被广泛应用于Transformer模型及其变体中,因为它能够有效缓解梯度消失问题,并提升训练的稳定性),使用RMSNorm作为归一化函数,并且使用SwiGLU作为前馈网络的激活函数,采用旋转嵌入作为位置编码,使用与DeepSeek-LLM相同的to

### Ollama Model Download Path Configuration and Default Storage Location For the command `ollama run deepseek-r1:1.5b`, specifying a custom download path or identifying the default storage location involves understanding how Ollama manages model files[^1]. Typically, large models like those from DeepSeek are stored in predefined directories unless otherwise specified by environment variables or configuration settings. The default behavior of Ollama is to store downloaded models within its own directory structure on the system where it operates. This usually means that models will be saved under a hidden `.ollama` folder located either in the user's home directory or another designated area depending on installation specifics[^2]. To set a specific download path for models such as `deepseek-r1:1.5b`, one approach would involve setting an environment variable before running commands related to this process: ```bash export OLlama_MODEL_PATH=/path/to/custom/model/directory ``` After configuring the environment with the desired output directory through `OLlama_MODEL_PATH`, executing the original command should result in placing the requested version of the DeepSeek R1 model into `/path/to/custom/model/directory`. However, direct modification of these paths might not always be supported directly via CLI arguments when using tools provided by vendors; therefore, consulting official documentation accessible at the vendor’s website can provide more precise instructions tailored specifically towards managing installations including changing save locations. --related questions-- 1. How does modifying environment variables affect other operations performed by Ollama? 2. What alternatives exist if setting environment variables isn't feasible due to operational constraints? 3. Can multiple versions of the same model coexist without conflicts once different storage paths have been defined? 4. Is there any performance impact associated with altering the default model storage location?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值