• 游客发表

    DeepSeek更新GitHub仓库,新模型“MODEL1”曝光

    发帖时间:2026-04-18 09:25:02

    \u003cdiv class=\"rich_media_content\"\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e\u003c!--IMG_0--\u003e\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cstrong\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e文|乔巴\u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cstrong\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e编辑|徐青阳\u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e北京时间1月21日,\u003c!--VERTICAL_CARD_BEGIN_0--\u003eDeepSeek\u003c!--VERTICAL_CARD_END_0--\u003e官方GitHub仓库更新deepseek-ai/FlashMLA,借助 AI对全部总代码文件数: 114个(包括 .py, .md, .txt, .sh, .cpp, .cu, .h 文件)进行分析,发现了一个此前未公开的模型架构标识“\u003c!--VERTICAL_CARD_BEGIN_1--\u003eMODEL1\u003c!--VERTICAL_CARD_END_1--\u003e”,共被提及 31次。\u003c/span\u003e\u003c!--MID_AD_0--\u003e\u003c!--EOP_0--\u003e\u003c/p\u003e\u003c!--PARAGRAPH_0--\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003eMODEL1 是 DeepSeek \u003c!--VERTICAL_CARD_BEGIN_2--\u003eFlashMLA\u003c!--VERTICAL_CARD_END_2--\u003e 中支持的两个主要模型架构之一,另一个是 DeepSeek-V3.2。其中 V3.2 在文中作为最新的优化对象被多次提及,例如提到它在预训练和推理阶段的 TFlops 表现。它在以下方面与 V32 有显著差异,涉及KV缓存布局、量化策略和硬件优化等核心技术方向。\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e\u003cspan style=\"display: inline-block; max-width: 100%; width: 682.4859154929577px\" data-widget=\"image\"\u003e\u003c!--IMG_1--\u003e\u003cspan style=\"color: #999; display: block; font-size: 12px; line-height: 18px; text-align: center; word-wrap: break-word\"\u003e\u003c!--NO_READ_BEGIN--\u003e 注:AI辅助制图,非官方发布 \u003c!--NO_READ_END--\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003eMODEL1很可能是一个高效推理模型,相比V3.2,内存占用更低,适合边缘设备或成本敏感场景。它也可能是一个长序列专家,针对16K+序列优化,适合文档理解、代码分析等长上下文任务。\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e从代码结构看,MODEL1并非V3.2的简单缩小版,而是不同的架构选择。V3.2追求最大性能和精度,MODEL1则追求效率和可部署性。根据此前的消息,DeepSeek计划在2月中旬春节前后发布下一代旗舰模型。\u003c!--AI_AD_1000--\u003e\u003c!--AI_AD_2000--\u003e\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003eMODEL1的代码包括完整的测试套件和多架构支持。代码库包含完整的SM90和SM100支持,4个Decode配置和2个Prefill配置表明已进行广泛测试,代码质量和完整性表明产品化工作已在进行中。\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e根据代码中的枚举定义,可以猜测DeepSeek在架构层面已经为两个不同的模型做好了完整的技术支持。在csrc/api/sparse_decode.h中,代码通过Query-Key维度(d_qk)来自动识别模型类型:当d_qk为576时识别为V3.2,当d_qk为512时识别为MODEL1。\u003c/span\u003e\u003c!--MID_AD_1--\u003e\u003c!--EOP_1--\u003e\u003c/p\u003e\u003c!--PARAGRAPH_1--\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e这一识别逻辑表明MODEL1采用512维的Query-Key设计,相比V32的576维更加紧凑。\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003eMODEL1与V32在多个关键参数上存在显著差异。V3.2采用576维的Query-Key和512维的\u003c!--VERTICAL_CARD_BEGIN_3--\u003eNoPE\u003c!--VERTICAL_CARD_END_3--\u003e,而MODEL1则采用512维的Query-Key和448维的NoPE。在量化策略上,V3.2使用128字节的量化粒度和4个量化瓦片,MODEL1则使用64字节的量化粒度和7个量化瓦片。最重要的是,MODEL1的量化尺度使用\u003c!--VERTICAL_CARD_BEGIN_4--\u003efp8_e8m0fnu\u003c!--VERTICAL_CARD_END_4--\u003e格式存储,相比V3.2节省了75%的存储空间。\u003c/span\u003e\u003c!--MID_AD_2--\u003e\u003c!--EOP_2--\u003e\u003c/p\u003e\u003c!--PARAGRAPH_2--\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e在\u003c!--VERTICAL_CARD_BEGIN_5--\u003eKVCache\u003c!--VERTICAL_CARD_END_5--\u003e布局上,根据tests/quant.py中的配置:\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e\u003c!--IMG_2--\u003e\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003eMODEL1每个token的KVCache大小为584字节,相比V3.2的592字节节省了8字节。虽然单个token的节省看起来微不足道,但在32K长度的序列中可以节省约256KB的KVCache内存,这对于边缘设备部署或成本敏感的应用场景具有重要意义。\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cstrong\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e多架构硬件支持\u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003eMODEL1的硬件实现跨越多个GPU架构。在NVIDIA H100/H200(SM90架构)上有两个版本:model1_persistent_h64.cu用于64头配置,model1_persistent_h128.cu用于128头配置。在最新的B200(SM100架构)上有专门的Head64内核实现。\u003c/span\u003e\u003c!--MID_AD_3--\u003e\u003c!--EOP_3--\u003e\u003c/p\u003e\u003c!--PARAGRAPH_3--\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e特别值得注意的是,SM100的Head128实现仅支持MODEL1,不支持V32,可以猜测DeepSeek在为新一代GPU专门优化MODEL1架构。\u003c/span\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cstrong\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e测试配置揭示应用场景\u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003cspan style=\"letter-spacing: 1px\"\u003e在稀疏解码测试中,MODEL1有4个测试配置,涵盖64头和128头的设置,序列长度为16384(16K tokens),支持两层\u003c!--VERTICAL_CARD_BEGIN_6--\u003e稀疏注意力\u003c!--VERTICAL_CARD_END_6--\u003e机制。在Prefill测试中,MODEL1有2个配置,输入序列长度从8K到131K,支持64和128头配置,动态topk从512到1024。这表明MODEL1被设计用于处理长序列推理场景,特别是需要高效稀疏注意力的应用。\u003c/span\u003e\u003c!--MID_AD_4--\u003e\u003c!--EOP_4--\u003e\u003c/p\u003e\u003c!--PARAGRAPH_4--\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003c/p\u003e\u003cp style=\"line-height: 1.75; margin-bottom: 24px; text-align: justify\"\u003e\u003c/p\u003e\u003cdiv powered-by=\"qqnews_ex-editor\"\u003e\u003c/div\u003e\u003cstyle\u003e.rich_media_content{--news-tabel-th-night-color: #444444;--news-font-day-color: #333;--news-font-night-color: #d9d9d9;--news-bottom-distance: 22px}.rich_media_content p:not([data-exeditor-arbitrary-box=image-box]){letter-spacing:.5px;line-height:30px;margin-bottom:var(--news-bottom-distance);word-wrap:break-word}.rich_media_content .qn-editor-copy p:not([data-exeditor-arbitrary-box=image-box]){letter-spacing:unset;line-height:unset;margin-bottom:unset;word-wrap:unset}.rich_media_content{color:var(--news-font-day-color);font-size:18px}@media(prefers-color-scheme:dark){body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content p:not([data-exeditor-arbitrary-box=image-box]){letter-spacing:.5px;line-height:30px;margin-bottom:var(--news-bottom-distance);word-wrap:break-word}body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content .qn-editor-copy p:not([data-exeditor-arbitrary-box=image-box]):not(.qn-editor-copy){letter-spacing:unset;line-height:unset;margin-bottom:unset;word-wrap:unset}body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content{color:var(--news-font-night-color)}}.data_color_scheme_dark .rich_media_content p:not([data-exeditor-arbitrary-box=image-box]){letter-spacing:.5px;line-height:30px;margin-bottom:var(--news-bottom-distance);word-wrap:break-word}.data_color_scheme_dark .rich_media_content .qn-editor-copy p:not([data-exeditor-arbitrary-box=image-box]){letter-spacing:unset;line-height:unset;margin-bottom:unset;word-wrap:unset}.data_color_scheme_dark .rich_media_content{color:var(--news-font-night-color)}.data_color_scheme_dark .rich_media_content{font-size:18px}.rich_media_content p[data-exeditor-arbitrary-box=image-box]{margin-bottom:11px}.rich_media_content\u003ediv:not(.qnt-video),.rich_media_content\u003esection{margin-bottom:var(--news-bottom-distance)}.rich_media_content hr{margin-bottom:var(--news-bottom-distance)}.rich_media_content .link_list{margin:0;margin-top:20px;min-height:0!important}.rich_media_content blockquote{background:#f9f9f9;border-left:6px solid #ccc;margin:1.5em 10px;padding:.5em 10px}.rich_media_content blockquote p{margin-bottom:0!important}.data_color_scheme_dark .rich_media_content blockquote{background:#323232}@media(prefers-color-scheme:dark){body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content blockquote{background:#323232}}.rich_media_content ol[data-ex-list]{--ol-start: 1;--ol-list-style-type: decimal;list-style-type:none;counter-reset:olCounter calc(var(--ol-start,1) - 1);position:relative}.rich_media_content ol[data-ex-list]\u003eli\u003e:first-child::before{content:counter(olCounter,var(--ol-list-style-type)) '. ';counter-increment:olCounter;font-variant-numeric:tabular-nums;display:inline-block}.rich_media_content ul[data-ex-list]{--ul-list-style-type: circle;list-style-type:none;position:relative}.rich_media_content ul[data-ex-list].nonUnicode-list-style-type\u003eli\u003e:first-child::before{content:var(--ul-list-style-type) ' ';font-variant-numeric:tabular-nums;display:inline-block;transform:scale(0.5)}.rich_media_content ul[data-ex-list].unicode-list-style-type\u003eli\u003e:first-child::before{content:var(--ul-list-style-type) ' ';font-variant-numeric:tabular-nums;display:inline-block;transform:scale(0.8)}.rich_media_content ol:not([data-ex-list]){padding-left:revert}.rich_media_content ul:not([data-ex-list]){padding-left:revert}.rich_media_content table{display:table;border-collapse:collapse;margin-bottom:var(--news-bottom-distance)}.rich_media_content table th,.rich_media_content table td{word-wrap:break-word;border:1px solid #ddd;white-space:nowrap;padding:2px 5px}.rich_media_content table th{font-weight:700;background-color:#f0f0f0;text-align:left}.rich_media_content table p{margin-bottom:0!important}.data_color_scheme_dark .rich_media_content table th{background:var(--news-tabel-th-night-color)}@media(prefers-color-scheme:dark){body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content table th{background:var(--news-tabel-th-night-color)}}.rich_media_content .qqnews_image_desc,.rich_media_content p[type=om-image-desc]{line-height:20px!important;text-align:center!important;font-size:14px!important;color:#666!important}.rich_media_content div[data-exeditor-arbitrary-box=wrap]:not([data-exeditor-arbitrary-box-special-style]){max-width:100%}.rich_media_content .qqnews-content{--wmfont: 0;--wmcolor: transparent;font-size:var(--wmfont);color:var(--wmcolor);line-height:var(--wmfont)!important;margin-bottom:var(--wmfont)!important}.rich_media_content .qqnews_sign_emphasis{background:#f7f7f7}.rich_media_content .qqnews_sign_emphasis ol{word-wrap:break-word;border:none;color:#5c5c5c;line-height:28px;list-style:none;margin:14px 0 6px;padding:16px 15px 4px}.rich_media_content .qqnews_sign_emphasis p{margin-bottom:12px!important}.rich_media_content .qqnews_sign_emphasis ol\u003eli\u003ep{padding-left:30px}.rich_media_content .qqnews_sign_emphasis ol\u003eli{list-style:none}.rich_media_content .qqnews_sign_emphasis ol\u003eli\u003ep:first-child::before{margin-left:-30px;content:counter(olCounter,decimal) ''!important;counter-increment:olCounter!important;font-variant-numeric:tabular-nums!important;background:#37f;border-radius:2px;color:#fff;font-size:15px;font-style:normal;text-align:center;line-height:18px;width:18px;height:18px;margin-right:12px;position:relative;top:-1px}.data_color_scheme_dark .rich_media_content .qqnews_sign_emphasis{background:#262626}.data_color_scheme_dark .rich_media_content .qqnews_sign_emphasis ol\u003eli\u003ep{color:#a9a9a9}@media(prefers-color-scheme:dark){body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content .qqnews_sign_emphasis{background:#262626}body:not([data-weui-theme=light]):not([dark-mode-disable=true]) .rich_media_content .qqnews_sign_emphasis ol\u003eli\u003ep{color:#a9a9a9}}.rich_media_content h1,.rich_media_content h2,.rich_media_content h3,.rich_media_content h4,.rich_media_content h5,.rich_media_content h6{margin-bottom:var(--news-bottom-distance);font-weight:700}.rich_media_content h1{font-size:20px}.rich_media_content h2,.rich_media_content h3{font-size:19px}.rich_media_content h4,.rich_media_content h5,.rich_media_content h6{font-size:18px}.rich_media_content li:empty{display:none}.rich_media_content ul,.rich_media_content ol{margin-bottom:var(--news-bottom-distance)}.rich_media_content div\u003ep:only-child{margin-bottom:0!important}.rich_media_content .cms-cke-widget-title-wrap p{margin-bottom:0!important}\u003c/style\u003e\u003c/div\u003e

      {loop type="link" row=1 }{$vo.title}