
由中國人工智慧研究實驗室 DeepSeek 在 High-Flyer 旗下開發的 DeepSeek V3 自 2024 年 12 月首次開源釋出以來,一直是人工智慧領域的佼佼者。它以高效、高效能和易用性著稱,並在繼續快速發展。DeepSeek V3 的最新更新於 2025 年 3 月 24 日推出,名為 “DeepSeek V3 0324”,帶來了微妙而有影響的改進。讓我們來看看這些更新,並試用新的 DeepSeek V3 模型。
小版本升級:DeepSeek V3 0324
- 此次升級增強了 DeepSeek 官網、手機應用和小程式的使用者體驗,預設關閉了“深度思考”模式。這表明升級的重點是簡化互動,而不是改變核心功能。
- API 介面和使用方法保持不變,確保了開發人員的連續性。這意味著現有的整合(例如透過 model=’deepseek-chat’)無需調整。
- 沒有提到重大的架構變化,這表明這是對現有的 671B 引數專家混合物(MoE)模型的改進,每個令牌可啟用 37B 引數。
- 可用性:更新後的模型已在 DeepSeek 官方平臺(網站、應用程式、小程式)和 HuggingFace 上上線。DeepSeek V3 0324 “的技術報告和權重可以在 MIT 許可下訪問。
DeepSeek V3 0324表現如何?
X 上的一位使用者在我的內部工作臺上試用了新的DeepSeek V3,它在所有測試中的各項指標都有大幅提升。它現在是最好的非推理模型,超越了 Sonnet 3.5。

Source: X
DeepSeek V3 登上 Chatbot Arena 排行榜:

Source: lmarena
如何訪問最新的DeepSeek V3?
- 網站:在 deepseek.com 免費測試更新的V3。
- 移動APP:可在 iOS 和 Android 上使用,已更新以反映 3 月 24 日的版本。
- API:在 api-docs.deepseek.com 上使用 model=’deepseek-chat’ 。定價仍為 0.14 美元/百萬輸入代幣(推廣期至 2025 年 2 月 8 日,但不排除延期的可能)。
- HuggingFace:從這裡下載“DeepSeek V3 0324”權重和技術報告。
試用新版DeepSeek V3 0324
我將在本地和透過API使用更新後的DeepSeek模型。
使用llm-mlx外掛在本地使用DeepSeek-V3-0324
安裝步驟
以下是在您的機器上執行它所需的裝置(假設您使用的是 llm
CLI + mlx 後端):
!llm mlx download-model mlx-community/DeepSeek-V3-0324-4bit
!pip install llm
!llm install llm-mlx
!llm mlx download-model mlx-community/DeepSeek-V3-0324-4bit
!pip install llm
!llm install llm-mlx
!llm mlx download-model mlx-community/DeepSeek-V3-0324-4bit
這將:
- 安裝核心
llm
CLI
- 新增 MLX 後端外掛
- 下載 4 位量化模型(
DeepSeek-V3-0324-4bit
)–更節省記憶體
在本地執行聊天提示
示例:
!llm chat -m mlx-community/DeepSeek-V3-0324-4bit 'Generate an SVG of a pelican riding a bicycle'
!llm chat -m mlx-community/DeepSeek-V3-0324-4bit 'Generate an SVG of a pelican riding a bicycle'
!llm chat -m mlx-community/DeepSeek-V3-0324-4bit 'Generate an SVG of a pelican riding a bicycle'
輸出:

如果模型執行成功,它就會響應一個騎著腳踏車的鵜鶘的 SVG 片段–憨態可掬,光彩奪目。
透過API使用DeepSeek-V3-0324
安裝所需軟體包
!pip3 install openai
是的,儘管您使用的是 DeepSeek,但您使用的是與 OpenAI 相容的 SDK 語法。
用於API互動的Python指令碼
下面是指令碼中經過清理的註釋版本:
from openai import OpenAI
# Initialize client with your DeepSeek API key and base URL
base_url="https://api.deepseek.com" # This is important
# Send a streaming chat request
response = client.chat.completions.create(
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "How many r's are there in Strawberry"},
# Handle streamed response and collect metrics
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
total_time = end_time - start_time
prompt_tps = prompt_tokens / total_time if prompt_tokens > 0 else 0
generation_tps = generated_tokens / total_time if generated_tokens > 0 else 0
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
from openai import OpenAI
import time
# Timing setup
start_time = time.time()
# Initialize client with your DeepSeek API key and base URL
client = OpenAI(
api_key="Your_api_key",
base_url="https://api.deepseek.com" # This is important
)
# Send a streaming chat request
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "How many r's are there in Strawberry"},
],
stream=True
)
# Handle streamed response and collect metrics
prompt_tokens = 0
generated_tokens = 0
full_response = ""
for chunk in response:
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if content:
generated_tokens += 1
full_response += content
print(content, end="", flush=True)
# Performance tracking
end_time = time.time()
total_time = end_time - start_time
# Token/sec calculations
prompt_tps = prompt_tokens / total_time if prompt_tokens > 0 else 0
generation_tps = generated_tokens / total_time if generated_tokens > 0 else 0
# Output metrics
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
from openai import OpenAI
import time
# Timing setup
start_time = time.time()
# Initialize client with your DeepSeek API key and base URL
client = OpenAI(
api_key="Your_api_key",
base_url="https://api.deepseek.com" # This is important
)
# Send a streaming chat request
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "How many r's are there in Strawberry"},
],
stream=True
)
# Handle streamed response and collect metrics
prompt_tokens = 0
generated_tokens = 0
full_response = ""
for chunk in response:
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if content:
generated_tokens += 1
full_response += content
print(content, end="", flush=True)
# Performance tracking
end_time = time.time()
total_time = end_time - start_time
# Token/sec calculations
prompt_tps = prompt_tokens / total_time if prompt_tokens > 0 else 0
generation_tps = generated_tokens / total_time if generated_tokens > 0 else 0
# Output metrics
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
輸出
### Final AnswerAfter carefully examining each letter in "Strawberry," we find that the letter 'r' appears **3 times**.**Answer:** There are **3 r's** in the word "Strawberry."--- Performance Metrics ---Prompt: 17 tokens, 0.709 tokens-per-secGeneration: 576 tokens, 24.038 tokens-per-secTotal time: 23.96 secondsFull response length: 1923 characters
點選此處檢視完整程式碼和輸出結果。
使用DeepSeek-V3-0324構建數字營銷網站
使用高階語言模型 DeepSeek-V3-0324,透過基於提示的程式碼生成方法,自動生成一個現代、時尚、小巧的數字營銷登陸頁面。
# Please install OpenAI SDK first: `pip3 install openai`
from openai import OpenAI
start_time = time.time() # Add this line to initialize start_time
client = OpenAI(api_key="Your_API_KEY", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
{"role": "system", "content": "You are a Website Developer"},
{"role": "user", "content": "Code a modern small digital marketing Landing page"},
stream=True # This line makes the response a stream of events
# Initialize variables to track tokens and content
# Track prompt tokens (usually only in first chunk)
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
# Track generated content
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
# Calculate timing metrics
total_time = end_time - start_time
# Calculate tokens per second
prompt_tps = prompt_tokens / total_time
generation_tps = generated_tokens / total_time
# Print metrics similar to the screenshot
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
!pip3 install openai
# Please install OpenAI SDK first: `pip3 install openai`
from openai import OpenAI
import time
# Record the start time
start_time = time.time() # Add this line to initialize start_time
client = OpenAI(api_key="Your_API_KEY", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a Website Developer"},
{"role": "user", "content": "Code a modern small digital marketing Landing page"},
],
stream=True # This line makes the response a stream of events
)
# Initialize variables to track tokens and content
prompt_tokens = 0
generated_tokens = 0
full_response = ""
# Process the stream
for chunk in response:
# Track prompt tokens (usually only in first chunk)
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
# Track generated content
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if content:
generated_tokens += 1
full_response += content
print(content, end="", flush=True)
# Calculate timing metrics
end_time = time.time()
total_time = end_time - start_time
# Calculate tokens per second
if prompt_tokens > 0:
prompt_tps = prompt_tokens / total_time
else:
prompt_tps = 0
if generated_tokens > 0:
generation_tps = generated_tokens / total_time
else:
generation_tps = 0
# Print metrics similar to the screenshot
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
!pip3 install openai
# Please install OpenAI SDK first: `pip3 install openai`
from openai import OpenAI
import time
# Record the start time
start_time = time.time() # Add this line to initialize start_time
client = OpenAI(api_key="Your_API_KEY", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a Website Developer"},
{"role": "user", "content": "Code a modern small digital marketing Landing page"},
],
stream=True # This line makes the response a stream of events
)
# Initialize variables to track tokens and content
prompt_tokens = 0
generated_tokens = 0
full_response = ""
# Process the stream
for chunk in response:
# Track prompt tokens (usually only in first chunk)
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
# Track generated content
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if content:
generated_tokens += 1
full_response += content
print(content, end="", flush=True)
# Calculate timing metrics
end_time = time.time()
total_time = end_time - start_time
# Calculate tokens per second
if prompt_tokens > 0:
prompt_tps = prompt_tokens / total_time
else:
prompt_tps = 0
if generated_tokens > 0:
generation_tps = generated_tokens / total_time
else:
generation_tps = 0
# Print metrics similar to the screenshot
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
輸出:
該網頁是為一家名為“NexaGrowth”的數字營銷機構設計的,採用了現代、簡潔的設計風格,並精心選擇了色調。該網頁的佈局是響應式的,採用了現代網頁設計技術。
您可以在這裡檢視網站。
點選此處檢視完整程式碼和輸出。
舊版更新的背景
為了說明新內容,這裡簡要回顧一下 3 月 24 日更新前的 V3 基準:
- 首次釋出:DeepSeek V3 推出時有 671B 個引數,使用 266.4M H800 GPU 小時對 14.8T 代幣進行訓練,價格為 550-58 萬美元。它引入了多頭潛意識(MLA)、多令牌預測(MTP)和無輔助損失負載均衡,實現了 60 令牌/秒的速度,效能超過了 Llama 3.1 405B。
- 訓練後:DeepSeek R1 的推理能力被提煉到 V3 中,透過監督微調 (SFT) 和強化學習 (RL) 增強了其效能,只需額外 0.124M GPU 小時即可完成。
- 三月份的更新是在此基礎上進行的,重點是可用性和有針對性的效能調整,而不是全面革新。
小結
DeepSeek V3 0324 更新看似很小,卻帶來了很大的改進。它現在更快了,可以快速處理數學和編碼等任務。它還非常穩定,不管是編碼還是解決問題,每次都能給出好結果。此外,它還能寫 700 行程式碼而不會出錯,這對於用程式碼構建東西的人來說非常棒。它仍然使用智慧的 671B 引數設定,而且價格便宜。請試用新版 DeepSeek V3 0324,並在評論中告訴我您的想法!
評論留言