Phi-4-mini與o1-mini：哪個SLM更好？

人工智慧模型的發展達到了新的高度，尤其是在效率和效能至關重要的小型語言模型（SLM）方面。在最新的競爭者中，Phi-4 mini 和 o1-mini作為先進高效的模型脫穎而出。在本文中，我們將對 Phi-4-mini 和 o1-mini 進行比較，以瞭解它們在 STEM 應用程式和編碼任務中的使用者體驗、速度和效能。我們將評估它們在程式設計、除錯和整體效率方面的優勢，看看哪種型號的效能更好。最後，您將對哪種型號符合您的需求有一個清晰的認識。

什麼是Phi-4-mini？

Phi-4-mini 是最先進的 SLM，專為高效能推理和編碼任務而設計。它在效率和準確性之間取得了平衡，是人工智慧驅動型應用的有力競爭者。該模型專為高精度文字生成和複雜推理任務而設計，同時計算效率高，非常適合邊緣計算環境。

架構概述

Phi-4-mini 是一個密集的純解碼器轉換器模型，擁有 38 億個引數和 128K 標記上下文視窗。它支援 200,064 個 token 的詞彙量，並採用了分組查詢注意（GQA）技術，在保持高效能的同時優化了資源效率。

分組查詢注意（GQA）是一種高效的注意機制，它通過分組查詢頭和共享鍵/值頭，平衡了多查詢注意（MQA）的速度和多頭注意（MHA）的質量，從而提高了語言模型的推理速度。

主要特點

共享輸入輸出嵌入：通過重複使用輸入和輸出嵌入，減少記憶體開銷。
訓練資料：在 5 萬億個詞塊上進行訓練，包括高質量的教育材料、編碼示例和為推理量身定製的合成資料。
效能：在推理、數學、編碼和指令遵循方面表現出色，並能通過函式呼叫整合外部 API。

什麼是o1-mini？

o1-mini 是一款輕量級、高價效比的 SLM，旨在兼顧經濟性和效能。它優先考慮高效處理，同時保持一般人工智慧應用的合理精度水平。

架構概述

o1-mini 採用標準 transformer 架構，引數少於 Phi-4-mini（具體尺寸未公開）。它還支援 128K 標記上下文視窗，但側重於經濟高效的處理，而不是像 GQA 這樣的架構優化。

模型比較：Phi-4-mini與o1-mini的比較

Phi-4-mini 是專為推理、數學和編碼等任務設計的強大模型，而 o1-mini 則採用更簡單的設計，側重於經濟高效的編碼。下表列出了它們的主要區別：

功能	Phi-4-mini	o1-mini
架構型別	密集、僅解碼器 transformer	標準transformer（細節有限）
引數	38 億	未指定（一般較小）
上下文視窗	128K tokens	128K tokens
注意機制	分組查詢注意 (GQA)	未明確詳細說明
共享嵌入	是	未說明
訓練資料量	5 萬億位元組	未指定
效能重點	在推理、數學、編碼方面具有高準確性	在編碼任務方面具有成本效益
部署適合	邊緣計算環境	一般使用，但不太穩健

Phi-4-mini 憑藉 GQA 和共享嵌入等先進功能脫穎而出，使其在推理、編碼和 API 整合方面更勝一籌。相比之下，o1-mini 雖然缺乏 Phi-4-mini 的架構完善性，但它是一種更輕便、更具成本效益的替代方案，並針對編碼進行了優化。在兩者之間做出選擇，取決於在特定任務中是優先考慮高精度和推理能力，還是優先考慮經濟性和效率。

推理效能評估

本節將考察 Phi-4-mini 和 o3-mini 模型與較大模型相比在推理方面的表現。本節將重點關注它們在解決複雜問題和做出邏輯結論方面的表現，並強調較小模型和較大模型在準確性、效率和清晰度方面的差異。

Phi-4-mini和o1-mini與較大模型的比較

推理增強型 Phi-4-mini 和 o1-mini 的推理能力通過多個基準進行評估，包括 AIME 2024、MATH-500 和 GPQA Diamond。這些基準測試評估了高階數學推理和一般問題解決技能，為與 DeepSeek、Bespoke 和 OpenThinker 的多個大型模型進行比較提供了基礎。

模型	AIME	MATH-500	GPQA Diamond
o1-mini*	63.6	90.0	60.0
DeepSeek-R1-Distill-Qwen-7B	53.3	91.4	49.5
DeepSeek-R1-Distill-Llama-8B	43.3	86.9	47.3
Bespoke-Stratos-7B*	20.0	82.0	37.8
OpenThinker-7B*	31.3	83.0	42.4
Llama-3-2-3B-Instruct	6.7	44.4	25.3
Phi-4-Mini	10.0	71.8	36.9
Phi-4-Mini (reasoning trained) (3.8B)	50.0	90.4	49.0

Source: HuggingFace

儘管只有 38 億個引數，但經過推理訓練的 Phi-4-mini 表現出強勁的效能，超過了 DeepSeek-R1-Distill-Llama-8B 等更大的模型：

DeepSeek-R1-Distill-Llama-8B（80 億引數）
Bespoke-Stratos-7B （7B 個引數）
OpenThinker-7B （7B 個引數）

此外，它還實現了與 DeepSeek-R1-Distill-Qwen-7B 相媲美的效能，而 DeepSeek-R1-Distill-Qwen-7B 是一個更大的 7B 模型，這進一步凸顯了它的效率。不過，儘管 o1-mini 的引數大小未公開，但它在多項基準測試中都遙遙領先，成為人工智慧推理任務的有力競爭者。

基準測試比較

如圖所示，這兩個模型的效能突出了它們與大型模型的競爭力：

AIME Benchmark：

o1-mini 得分為 63.6，是所有模型中最高的。
Phi-4-mini（推理訓練）得分 50.0，比其基礎版本（10.0）提高了五倍。

MATH-500 Benchmark：

Phi-4-mini（90.4 分）略高於 o1-mini（90.0 分），使其在複雜的數學推理任務中非常有效。

GPQA Diamond:

o1-mini 以 60.0 分遙遙領先，展示了卓越的一般問題解決能力。
Phi-4-mini（49.0）超過了多個 7B 和 8B 模型，證明了它在推理任務中的效率。

這些結果表明，o1-mini 在一般問題解決和推理方面佔優勢，而 Phi-4-mini（推理訓練型）儘管規模較小（3.8B 個引數），但在數學基準測試中表現出色。這兩個模型都表現出了非凡的效率，在關鍵的人工智慧基準測試中挑戰甚至超越了更大的模型。

Phi-4-mini與o1-mini：推理和編碼能力

現在，我們將比較 Phi-4-mini 和 o1-mini 的推理和程式設計能力。為此，我們將向兩個模型發出相同的提示，並評估它們的響應，我們還將使用 API 來載入模型。以下是我們將在此次比較中嘗試的任務：

分析構建順序關係
數學邏輯推理
查詢最長子串

任務 1：分析建築順序關係

本任務要求模型根據給定的限制條件推斷建築物的相對位置，並找出中間的建築物。

提示詞：There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order). V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y. Which is the building in the middle?Options:A) VB) WC) XD) Y”

輸入o1-mini

from openai import OpenAI

import time

import tiktoken

from IPython.display import display, Markdown

with open("path_to_api_key") as file:

api_key = file.read().strip()

task1_start_time = time.time()

client = OpenAI(api_key=api_key)

messages = [

{

"role": "user",

"content": """

There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order).

V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y.

Which is the building in the middle?

Options:

A) V

B) W

C) X

D) Y

"""

}

]

completion = client.chat.completions.create(

model="o1-mini-2024-09-12",

messages=messages

)

task1_end_time = time.time()

# Print results

print(completion.choices[0].message)

print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time)

# Display result

from IPython.display import Markdown

display(Markdown(completion.choices[0].message.content))

from openai import OpenAI import time import tiktoken from IPython.display import display, Markdown with open("path_to_api_key") as file: api_key = file.read().strip() task1_start_time = time.time() client = OpenAI(api_key=api_key) messages = [ { "role": "user", "content": """ There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order). V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y. Which is the building in the middle? Options: A) V B) W C) X D) Y """ } ] completion = client.chat.completions.create( model="o1-mini-2024-09-12", messages=messages ) task1_end_time = time.time() # Print results print(completion.choices[0].message) print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time) # Display result from IPython.display import Markdown display(Markdown(completion.choices[0].message.content))

from openai import OpenAI
import time
import tiktoken
from IPython.display import display, Markdown
with open("path_to_api_key") as file:
api_key = file.read().strip()
task1_start_time = time.time()
client = OpenAI(api_key=api_key)
messages = [
{
"role": "user",
"content": """
There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order).
V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y.
Which is the building in the middle?
Options:
A) V
B) W
C) X
D) Y
"""
}
]
completion = client.chat.completions.create(
model="o1-mini-2024-09-12",
messages=messages
)
task1_end_time = time.time()
# Print results
print(completion.choices[0].message)
print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time)
# Display result
from IPython.display import Markdown
display(Markdown(completion.choices[0].message.content))

o1-mini響應情況

分析建築順序關係

輸入Phi 4-mini

from transformers import pipeline

import time

from IPython.display import display, Markdown

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

# Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-instruct", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-instruct", trust_remote_code=True, quantization_config=quantization_config)

task1_start_time = time.time()

messages = [

{"role": "system", "content": "You are an expert in solving numerical and general reasoning questions."},

{"role": "user", "content": """There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order).

V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y.Which is the building in the middle? Options:

A) V

B) W

C) X

D) Y"""},

]

pipe = pipeline(

"text-generation",

model=model,

tokenizer=tokenizer,

)

generation_args = {

"max_new_tokens": 1024,

"return_full_text": False,

"temperature": 0.0,

"do_sample": False,

}

output = pipe(messages, **generation_args) #,

task1_end_time = time.time()

print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time)

display(Markdown((output[0]['generated_text'])))

pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-instruct", trust_remote_code=True)

pipe(messages)

from transformers import pipeline import time from IPython.display import display, Markdown from transformers import AutoModelForCausalLM, BitsAndBytesConfig quantization_config = BitsAndBytesConfig(load_in_8bit=True) # Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-instruct", trust_remote_code=True, quantization_config=quantization_config) task1_start_time = time.time() messages = [ {"role": "system", "content": "You are an expert in solving numerical and general reasoning questions."}, {"role": "user", "content": """There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order). V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y.Which is the building in the middle? Options: A) V B) W C) X D) Y"""}, ] pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, ) generation_args = { "max_new_tokens": 1024, "return_full_text": False, "temperature": 0.0, "do_sample": False, } output = pipe(messages, **generation_args) #, task1_end_time = time.time() print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time) display(Markdown((output[0]['generated_text']))) pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-instruct", trust_remote_code=True) pipe(messages)

from transformers import pipeline
import time
from IPython.display import display, Markdown
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-instruct", trust_remote_code=True, quantization_config=quantization_config) 
task1_start_time = time.time()
messages = [
{"role": "system", "content": "You are an expert in solving numerical and general reasoning questions."},
{"role": "user", "content": """There are five buildings called V, W, X, Y and Z in a row (not necessarily in that order).
V is to the West of W. Z is to the East of X and the West of V, W is to the West of Y.Which is the building in the middle? Options:
A) V
B) W
C) X
D) Y"""},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 1024,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args) #,
task1_end_time = time.time()
print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time)
display(Markdown((output[0]['generated_text'])))
pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-instruct", trust_remote_code=True)
pipe(messages)

Phi 4-mini響應情況

分析建築順序關係

比較分析

o1-mini 只需幾步就能迅速找出正確答案（“V”），而 Phi-4-mini 則需要更長的時間，因為它要一步一步地檢查每個細節。即使付出了這麼多努力，Phi-4-mini 仍然得到了錯誤的答案（“Z”），而這甚至不是選項之一。這表明，Phi-4-mini 在處理簡單的邏輯問題時很吃力，而 o1-mini 則能快速、正確地處理這些問題。Phi-4-mini的詳細思考可能對較難的問題有用，但在這個案例中，它只會造成延誤和錯誤。

任務 2：數學邏輯推理

本任務要求模型識別給定數字序列中的模式，並找出缺失的數字。

Prompt：Select the number from among the given options that can replace the question mark (?) in the following series:16, 33, 100, 401, ?Options: A) 1235 B) 804 C) 1588D) 2006

輸入o1-mini

task2_start_time = time.time()

client = OpenAI(api_key=api_key)

messages = [

{

"role": "user",

"content": """Select the number from among the given options that can replace the question mark (?) in the following series.16, 33, 100, 401, ?

A) 1235

B) 804

C) 1588

D) 2006"""

}

]

# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)

encoding = tiktoken.get_encoding("cl100k_base")

# Calculate token counts

input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)

completion = client.chat.completions.create(

model="o1-mini-2024-09-12",

messages=messages

)

output_tokens = len(encoding.encode(completion.choices[0].message.content))

task2_end_time = time.time()

# Print results

print(completion.choices[0].message)

print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time)

# Display result

from IPython.display import Markdown

display(Markdown(completion.choices[0].message.content))

task2_start_time = time.time() client = OpenAI(api_key=api_key) messages = [ { "role": "user", "content": """Select the number from among the given options that can replace the question mark (?) in the following series.16, 33, 100, 401, ? A) 1235 B) 804 C) 1588 D) 2006""" } ] # Use a compatible encoding (cl100k_base is the best option for new OpenAI models) encoding = tiktoken.get_encoding("cl100k_base") # Calculate token counts input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages) completion = client.chat.completions.create( model="o1-mini-2024-09-12", messages=messages ) output_tokens = len(encoding.encode(completion.choices[0].message.content)) task2_end_time = time.time() # Print results print(completion.choices[0].message) print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time) # Display result from IPython.display import Markdown display(Markdown(completion.choices[0].message.content))

task2_start_time = time.time()
client = OpenAI(api_key=api_key)
messages = [
{
"role": "user",
"content": """Select the number from among the given options that can replace the question mark (?) in the following series.16, 33, 100, 401, ?
A) 1235
B) 804
C) 1588
D) 2006"""
}
]
# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)
encoding = tiktoken.get_encoding("cl100k_base")
# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
completion = client.chat.completions.create(
model="o1-mini-2024-09-12",
messages=messages
)
output_tokens = len(encoding.encode(completion.choices[0].message.content))
task2_end_time = time.time()
# Print results
print(completion.choices[0].message)
print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time)
# Display result
from IPython.display import Markdown
display(Markdown(completion.choices[0].message.content))

o1-mini響應情況

數學邏輯推理

輸入到Phi 4-mini

task2_start_time = time.time()

messages = [

{"role": "system", "content": "You are an expert in solving numerical and general reasoning questions."},

{"role": "user", "content": """Select the number from among the given options

that can replace the question mark (?) in the following series.16, 33, 100, 401, ?

A) 1235

B) 804

C) 1588

D) 2006"""},

]

pipe = pipeline(

"text-generation",

model=model,

tokenizer=tokenizer,

)

generation_args = {

"max_new_tokens": 1024,

"return_full_text": False,

"temperature": 0.0,

"do_sample": False,

}

output = pipe(messages, **generation_args) #,

task2_end_time = time.time()

print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time)

display(Markdown((output[0]['generated_text'])))

task2_start_time = time.time() messages = [ {"role": "system", "content": "You are an expert in solving numerical and general reasoning questions."}, {"role": "user", "content": """Select the number from among the given options that can replace the question mark (?) in the following series.16, 33, 100, 401, ? A) 1235 B) 804 C) 1588 D) 2006"""}, ] pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, ) generation_args = { "max_new_tokens": 1024, "return_full_text": False, "temperature": 0.0, "do_sample": False, } output = pipe(messages, **generation_args) #, task2_end_time = time.time() print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time) display(Markdown((output[0]['generated_text'])))

task2_start_time = time.time()
messages = [
{"role": "system", "content": "You are an expert in solving numerical and general reasoning questions."},
{"role": "user", "content": """Select the number from among the given options
that can replace the question mark (?) in the following series.16, 33, 100, 401, ?
A) 1235
B) 804
C) 1588
D) 2006"""},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 1024,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args) #,
task2_end_time = time.time()
print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time)
display(Markdown((output[0]['generated_text'])))

Phi 4-mini響應情況

數學邏輯推理

比較分析

在數字模式任務中，o1-mini 的速度和準確性都優於 Phi-4-mini。o1-mini 能快速識別模式，並在 10.77 秒內正確選擇 2006。相反，Phi-4-mini 花了更長的時間（50.25 秒），卻仍然得到了錯誤的答案（120）。與此同時，o1-mini 採用了清晰而直接的方法，正確而高效地解決了問題。這表明 o1-mini 更善於快速發現數字模式，而 Phi-4-mini 則傾向於把簡單問題過於複雜化，從而導致錯誤和延誤。

任務 3：找出最長的子串

這個問題要求你找出給定字串中不包含任何重複字元的最長子串的長度。例如，在字串“abcabcbb”中，不含重複字元的最長子串為“abc”，其長度為 3。

提示詞：Given a string s, find the length of the longest substring without repeating characters.Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters.

輸入到o1-mini

task3_start_time = time.time()

client = OpenAI(api_key=api_key)

messages = [

{

"role": "user",

"content": """

Given a string s, find the length of the longest substring without repeating characters.

Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters.

"""

}

]

# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)

encoding = tiktoken.get_encoding("cl100k_base")

# Calculate token counts

input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)

completion = client.chat.completions.create(

model="o1-mini-2024-09-12",

messages=messages

)

output_tokens = len(encoding.encode(completion.choices[0].message.content))

task3_end_time = time.time()

# Print results

print(completion.choices[0].message)

print("----------------=Total Time Taken for task 3:----------------- ", task3_end_time - task3_start_time)

# Display result

from IPython.display import Markdown

display(Markdown(completion.choices[0].message.content))

task3_start_time = time.time() client = OpenAI(api_key=api_key) messages = [ { "role": "user", "content": """ Given a string s, find the length of the longest substring without repeating characters. Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters. """ } ] # Use a compatible encoding (cl100k_base is the best option for new OpenAI models) encoding = tiktoken.get_encoding("cl100k_base") # Calculate token counts input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages) completion = client.chat.completions.create( model="o1-mini-2024-09-12", messages=messages ) output_tokens = len(encoding.encode(completion.choices[0].message.content)) task3_end_time = time.time() # Print results print(completion.choices[0].message) print("----------------=Total Time Taken for task 3:----------------- ", task3_end_time - task3_start_time) # Display result from IPython.display import Markdown display(Markdown(completion.choices[0].message.content))

task3_start_time = time.time()
client = OpenAI(api_key=api_key)
messages = [
{
"role": "user",
"content": """
Given a string s, find the length of the longest substring without repeating characters.
Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters.
"""
}
]
# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)
encoding = tiktoken.get_encoding("cl100k_base")
# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)
completion = client.chat.completions.create(
model="o1-mini-2024-09-12",
messages=messages
)
output_tokens = len(encoding.encode(completion.choices[0].message.content))
task3_end_time = time.time()
# Print results
print(completion.choices[0].message)
print("----------------=Total Time Taken for task 3:----------------- ", task3_end_time - task3_start_time)
# Display result
from IPython.display import Markdown
display(Markdown(completion.choices[0].message.content))

o1-mini響應情況

def lengthOfLongestSubstring(s: str) -> int:

"""

Returns the length of the longest substring without repeating characters in the string s.

Parameters:

s (str): The input string.

Returns:

int: The length of the longest substring without repeating characters.

"""

# Dictionary to store the last positions of each character.

char_map = {}

# Initialize the left pointer of the window.

left = 0

# Initialize the maximum length.

max_length = 0

# Iterate over the string with the right pointer.

for right, char in enumerate(s):

# If the character is found in char_map and its last position is >= left pointer,

# move the left pointer to one position right of its last occurrence to avoid duplicates.

if char in char_map and char_map[char] >= left:

left = char_map[char] + 1

# Update/insert the character's latest index.

char_map[char] = right

# Calculate the length of the current window and update max_length if necessary.

current_window_length = right - left + 1

if current_window_length > max_length:

max_length = current_window_length

# Debug statements to trace the internal state (can be removed in production)

# print(f"Current character: {char}, Left: {left}, Right: {right}, Current Window: '{s[left:right+1]}', Max Length: {max_length}")

return max_length

# Example usage:

if __name__ == "__main__":

test_strings = [

"abcabcbb",

"bbbbb",

"pwwkew",

"",

"a",

"dvdf"

]

for s in test_strings:

print(f"Longest substring without repeating characters in '{s}' is of length {lengthOfLongestSubstring(s)}.")

def lengthOfLongestSubstring(s: str) -> int: """ Returns the length of the longest substring without repeating characters in the string s. Parameters: s (str): The input string. Returns: int: The length of the longest substring without repeating characters. """ # Dictionary to store the last positions of each character. char_map = {} # Initialize the left pointer of the window. left = 0 # Initialize the maximum length. max_length = 0 # Iterate over the string with the right pointer. for right, char in enumerate(s): # If the character is found in char_map and its last position is >= left pointer, # move the left pointer to one position right of its last occurrence to avoid duplicates. if char in char_map and char_map[char] >= left: left = char_map[char] + 1 # Update/insert the character's latest index. char_map[char] = right # Calculate the length of the current window and update max_length if necessary. current_window_length = right - left + 1 if current_window_length > max_length: max_length = current_window_length # Debug statements to trace the internal state (can be removed in production) # print(f"Current character: {char}, Left: {left}, Right: {right}, Current Window: '{s[left:right+1]}', Max Length: {max_length}") return max_length # Example usage: if __name__ == "__main__": test_strings = [ "abcabcbb", "bbbbb", "pwwkew", "", "a", "dvdf" ] for s in test_strings: print(f"Longest substring without repeating characters in '{s}' is of length {lengthOfLongestSubstring(s)}.")

def lengthOfLongestSubstring(s: str) -> int:
"""
Returns the length of the longest substring without repeating characters in the string s.
Parameters:
s (str): The input string.
Returns:
int: The length of the longest substring without repeating characters.
"""
# Dictionary to store the last positions of each character.
char_map = {}
# Initialize the left pointer of the window.
left = 0
# Initialize the maximum length.
max_length = 0
# Iterate over the string with the right pointer.
for right, char in enumerate(s):
# If the character is found in char_map and its last position is >= left pointer,
# move the left pointer to one position right of its last occurrence to avoid duplicates.
if char in char_map and char_map[char] >= left:
left = char_map[char] + 1
# Update/insert the character's latest index.
char_map[char] = right
# Calculate the length of the current window and update max_length if necessary.
current_window_length = right - left + 1
if current_window_length > max_length:
max_length = current_window_length
# Debug statements to trace the internal state (can be removed in production)
# print(f"Current character: {char}, Left: {left}, Right: {right}, Current Window: '{s[left:right+1]}', Max Length: {max_length}")
return max_length
# Example usage:
if __name__ == "__main__":
test_strings = [
"abcabcbb",
"bbbbb",
"pwwkew",
"",
"a",
"dvdf"
]
for s in test_strings:
print(f"Longest substring without repeating characters in '{s}' is of length {lengthOfLongestSubstring(s)}.")

輸出

找出最長的子串

輸入到Phi 4-mini

from transformers import pipeline

messages = [

{"role": "user",

"content":"""Given a string s, find the length of the longest substring without repeating characters.

Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters."""},

]

pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-instruct", trust_remote_code=True)

pipe(messages)

from transformers import pipeline messages = [ {"role": "user", "content":"""Given a string s, find the length of the longest substring without repeating characters. Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters."""}, ] pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-instruct", trust_remote_code=True) pipe(messages)

from transformers import pipeline
messages = [
{"role": "user",
"content":"""Given a string s, find the length of the longest substring without repeating characters.
Write a function lengthOfLongestSubstring(s: str) -> int that returns the length of the longest substring without repeating characters."""},
]
pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-instruct", trust_remote_code=True)
pipe(messages)

Phi 4-mini響應情況

def lengthOfLongestSubstring(s: str) -> int:

char_index_map = {}

left = 0

max_length = 0

for right in range(len(s)):

if s[right] in char_index_map and char_index_map[s[right]] >= left:

left = char_index_map[s[right]] + 1

char_index_map[s[right]] = right

max_length = max(max_length, right - left + 1)

return max_length

print("Longest substring without repeating characters in 'abcabcbb' is of length ",lengthOfLongestSubstring("abcabcbb"))

print("Longest substring without repeating characters in 'bbbbb' is of length ",lengthOfLongestSubstring("bbbbb"))

def lengthOfLongestSubstring(s: str) -> int: char_index_map = {} left = 0 max_length = 0 for right in range(len(s)): if s[right] in char_index_map and char_index_map[s[right]] >= left: left = char_index_map[s[right]] + 1 char_index_map[s[right]] = right max_length = max(max_length, right - left + 1) return max_length print("Longest substring without repeating characters in 'abcabcbb' is of length ",lengthOfLongestSubstring("abcabcbb")) print("Longest substring without repeating characters in 'bbbbb' is of length ",lengthOfLongestSubstring("bbbbb"))

def lengthOfLongestSubstring(s: str) -> int:
char_index_map = {}
left = 0
max_length = 0
for right in range(len(s)):
if s[right] in char_index_map and char_index_map[s[right]] >= left:
left = char_index_map[s[right]] + 1
char_index_map[s[right]] = right
max_length = max(max_length, right - left + 1)
return max_length
print("Longest substring without repeating characters in 'abcabcbb' is of length ",lengthOfLongestSubstring("abcabcbb"))
print("Longest substring without repeating characters in 'bbbbb' is of length  ",lengthOfLongestSubstring("bbbbb"))

輸出

找出最長的子串

比較分析

o1-mini 和 Phi-4-mini 都正確使用了滑動視窗方法，但 o1-mini 的程式碼更有條理，更容易理解。它包括清晰的解釋、註釋、測試用例和易讀的變數名。Phi-4-mini 的解決方案更簡短，但缺乏解釋和結構，在大型專案中更難理解。o1-mini 的速度更快，解決方案更簡潔、更易讀，而 Phi-4-mini 更注重保持程式碼的簡短。

總體比較分析

以下是所有 3 項任務的總體比較分析：

對比項	任務 1（構建順序）	任務 2（數列完成）	任務 3（最長非重複子串）
準確性	o1-mini 是正確的，而 Phi-4-mini 給出了一個錯誤的答案（“Z”，這不是一個選項）。	o1-mini 正確識別了 2006，而 Phi-4-mini 得到了錯誤的答案（120）。	兩者都採用了正確的滑動視窗方法。
響應速度	o1-mini 明顯更快。	o1-mini 快得多（10.77 秒對 50.25 秒）。	o1-mini 的反應速度稍快。
方法	o1-mini 採用了快速、合乎邏輯的推理方法，而 Phi-4-mini 則採取了不必要的步驟，但還是出了錯。	o1-mini 遵循了結構化、高效的模式識別方法，而 Phi-4-mini 則過度複雜化了過程，得到了錯誤的結果。	o1-mini 提供了一個結構嚴謹、文件齊全的解決方案，而 Phi-4-mini 則採用了一種簡潔但可讀性較低的方法。
編碼實踐	不適用。	不適用。	o1-mini 包含文件說明、註釋和測試用例，因此更易於理解和維護。Phi-4-mini 注重簡潔，但缺少文件。
最佳用例	o1-mini 在邏輯推理任務中更可靠，而 Phi-4-mini 的循序漸進方法可能更適合複雜問題。	o1-mini 在數字模式識別方面速度快、準確度高，而 Phi-4-mini 的過度分析可能會導致錯誤。	o1-mini 更適合結構化、可維護的程式碼，而 Phi-4-mini 更適合短小精悍的實現。

小結

總的來說，o1-mini 在結構化推理、準確性和編碼最佳實踐方面表現出色，更適合複雜問題的解決和可維護程式碼的編寫。雖然 Phi-4-mini 的速度更快，但其探索性方法偶爾會導致效率低下或結論錯誤，特別是在推理任務中。在編碼方面，o1-mini 提供了文件齊全、可讀性強的解決方案，而 Phi-4-mini 則以犧牲清晰度為代價，優先考慮簡潔性。如果速度是主要考慮因素，Phi-4-mini 是一個可靠的選擇，但對於精確度、清晰度和結構化解決問題，o1-mini 則是更好的選擇。

o1-mini Phi-4-mini SLM

Phi-4-mini與o1-mini：哪個SLM更好？

什麼是Phi-4-mini？

架構概述

主要特點

什麼是o1-mini？

架構概述

模型比較：Phi-4-mini與o1-mini的比較

推理效能評估

Phi-4-mini和o1-mini與較大模型的比較

基準測試比較

Phi-4-mini與o1-mini：推理和編碼能力

任務 1：分析建築順序關係

輸入o1-mini

o1-mini響應情況

輸入Phi 4-mini

Phi 4-mini響應情況

比較分析

任務 2：數學邏輯推理

輸入o1-mini

o1-mini響應情況

輸入到Phi 4-mini

Phi 4-mini響應情況

比較分析

任務 3：找出最長的子串

輸入到o1-mini

o1-mini響應情況

輸入到Phi 4-mini

Phi 4-mini響應情況

比較分析

總體比較分析

小結

評論留言

取消回覆

文章目錄

Phi-4-mini與o1-mini：哪個SLM更好？

什麼是Phi-4-mini？

架構概述

主要特點

什麼是o1-mini？

架構概述

模型比較：Phi-4-mini與o1-mini的比較

推理效能評估

Phi-4-mini和o1-mini與較大模型的比較

基準測試比較

Phi-4-mini與o1-mini：推理和編碼能力

任務 1： 分析建築順序關係

輸入o1-mini

o1-mini響應情況

輸入Phi 4-mini

Phi 4-mini響應情況

比較分析

任務 2：數學邏輯推理

輸入o1-mini

o1-mini響應情況

輸入到Phi 4-mini

Phi 4-mini響應情況

比較分析

任務 3：找出最長的子串

輸入到o1-mini

o1-mini響應情況

輸入到Phi 4-mini

Phi 4-mini響應情況

比較分析

總體比較分析

小結

相關的

評論留言

取消回覆

文章目錄

任務 1：分析建築順序關係