
在當今快節奏的數字世界中,企業不斷尋求創新方法來提高客戶參與度和簡化支援服務。一個有效的解決方案就是使用人工智慧驅動的客戶支援語音代理。這些人工智慧語音機器人能夠實時理解和響應基於語音的客戶支援查詢。它們利用會話式人工智慧實現互動自動化,減少等待時間,提高客戶支援效率。在本文中,我們將全面瞭解人工智慧語音客服代理,並學習如何使用 Deepgram 和 pygame 庫構建語音客服代理。
什麼是語音代理?
語音代理是一種人工智慧驅動的代理,旨在通過語音通訊與使用者互動。它能理解口語、處理請求並生成類似人類的響應。它能實現無縫的語音互動,減少人工輸入的需要,提升使用者體驗。與僅依賴文字輸入的傳統聊天機器人不同,語音代理可實現擴音、實時對話。這使其成為一種更自然、更高效的技術互動方式。
語音代理與傳統聊天機器人的區別
功能 |
語音代理 |
傳統聊天機器人 |
輸入方式 |
語音 |
文字 |
響應方式 |
語音 |
文字 |
擴音使用 |
是 |
否 |
響應時間 |
更快,實時 |
稍有延遲,取決於打字速度 |
理解口音 |
高階(因模型而異) |
不適用 |
多模態功能 |
可整合文字和語音 |
主要基於文字 |
語境保留 |
更高,可記住過去的互動內容 |
不同,通常僅限於文字歷史記錄 |
使用者體驗 |
更自然 |
要求輸入 |
語音代理的關鍵要素
語音代理是一種人工智慧驅動的系統,可促進基於語音的互動,通常用於客戶支援、虛擬助理和自動呼叫中心。它利用語音識別、自然語言處理(NLP)和文字到語音技術來理解使用者的詢問並提供適當的回覆。
在本節中,我們將探討語音代理的關鍵元件,以實現無縫、高效的語音通訊。

1. 自動語音識別(ASR)–語音到文字的轉換
語音代理工作流程的第一步是將口語轉換為文字。這是通過自動語音識別(ASR)實現的。
程式碼實現:
- Deepgram API 用於實時語音轉錄。
- deepgram.listen.live.v(“1”) 方法可捕獲實時音訊並將其轉錄為文字。
- 事件 LiveTranscriptionEvents.Transcript 處理並提取語音。
2. 自然語言處理 (NLP) – 理解使用者意圖
將語音轉錄為文字後,系統需要對其進行處理和理解。這裡使用 OpenAI 的 LLM(GPT 模型)進行自然語言理解(NLU)。
程式碼實現:
- 轉錄文字被新增到“對話”列表中。
- GPT 模型(o3-mini-2025-01-31)處理資訊,生成智慧回覆。
- 系統訊息(“system_message” )定義了代理的個性和範圍。
3. 文字到語音 (TTS) – 生成音訊回覆
系統生成回覆後,需要將其轉換為語音,以獲得自然的對話體驗。Deepgram 的 Aura Helios TTS 模型用於生成語音。
程式碼實現:
- generate_audio() 函式將生成的文字傳送到 Deepgram 的 TTS API(”DEEPGRAM_URL”)。
- 響應是一個音訊檔案,然後使用 “pygame.mixer” 播放。
4. 實時音訊處理和播放
對於實時語音代理來說,生成的語音必須在處理後立即播放。Pygame 的混音器模組用於處理音訊播放。
程式碼實現:
- playaudio() 函式使用 “pygame.mixer” 播放生成的音訊。
- 在播放響應時,麥克風會被靜音,以防止意外的音訊干擾。
5. 事件處理和對話流程
實時語音代理需要處理多個事件,如開啟和關閉連線、處理語音和處理錯誤。
程式碼實現:
- 為處理 ASR(”on_message”)、語篇檢測(”on_utterance_end”)和錯誤(”on_error”)註冊了事件監聽器。
- 該系統可確保順利處理使用者輸入和伺服器響應。
6. 麥克風輸入和語音控制
語音代理的一個關鍵方面是使用麥克風捕捉實時使用者輸入。Deepgram 麥克風模組用於實時音訊流。
程式碼實現:
- 麥克風持續監聽併傳送音訊資料供 ASR 處理。
- 系統可以在播放回復時將麥克風靜音/取消靜音。
訪問API金鑰
在開始構建語音代理的步驟之前,我們先來看看如何生成所需的 API 金鑰。
1. Deepgram API金鑰
要獲取 Deepgram API 金鑰,請訪問 Deepgram 並註冊 Deepgram 賬戶。如果您已有賬戶,只需登入即可。
登入後,點選 Create API Key 生成新金鑰。Deepgram 還提供 200 美元的免費點數,使用者無需支付初始費用即可體驗其服務。

2. OpenAI API金鑰
要訪問 OpenAI API 金鑰,請訪問 OpenAI 並登入賬戶。如果還沒有 OpenAI 賬戶,請註冊一個。
登入後,點選 Create new secret key 生成新金鑰。

建立語音代理的步驟
現在我們準備好建立一個語音代理了。在本指南中,我們將學習如何建立一個客戶支援語音代理,以自然、直觀的方式幫助使用者完成任務、回答疑問並提供個性化幫助。那麼,讓我們開始吧。
第 1 步:設定API金鑰
API 可幫助我們連線外部服務,如語音識別或文字生成。為了確保只有授權使用者才能使用這些服務,我們需要使用 API 金鑰進行身份驗證。為了安全起見,最好將金鑰儲存在單獨的文字檔案或環境變數中。這樣,程式就能在必要時安全地讀取和載入金鑰。
With open ("deepgram_apikey_path","r") as f:
API_KEY = f.read().strip()
with open("/openai_apikey_path","r") as f:
OPENAI_API_KEY = f.read().strip()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
With open ("deepgram_apikey_path","r") as f:
API_KEY = f.read().strip()
with open("/openai_apikey_path","r") as f:
OPENAI_API_KEY = f.read().strip()
load_dotenv()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
With open ("deepgram_apikey_path","r") as f:
API_KEY = f.read().strip()
with open("/openai_apikey_path","r") as f:
OPENAI_API_KEY = f.read().strip()
load_dotenv()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
第 2 步:定義系統指令
語音助理必須遵循明確的指導原則,以確保能提供有幫助、有條理的回覆。這些規則定義了語音助理的角色,例如它是客戶支援還是個人助理。這些規則還規定了回覆的語氣和風格,例如是正式、隨意還是專業。您甚至可以設定回覆的詳細程度或簡潔程度。
在這一步中,您要編寫一條系統訊息,解釋代理的目的,幷包含對話示例,以幫助生成更準確、更相關的回覆。
system_message = """ You are a customer support agent specializing in vehicle-related issues like flat tires, engine problems, and maintenance tips.
- Provide clear, easy-to-follow advice.
- Keep responses between 3 to 7 sentences.
- Offer safety recommendations where necessary.
- If a problem is complex, suggest visiting a professional mechanic.
User: "My tire is punctured, what should I do?"
Response: "First, pull over safely and turn on your hazard lights. If you have a spare tire, follow your car manual to replace it. Otherwise, call for roadside assistance. Stay in a safe location while waiting."
system_message = """ You are a customer support agent specializing in vehicle-related issues like flat tires, engine problems, and maintenance tips.
# Instructions:
- Provide clear, easy-to-follow advice.
- Keep responses between 3 to 7 sentences.
- Offer safety recommendations where necessary.
- If a problem is complex, suggest visiting a professional mechanic.
# Example:
User: "My tire is punctured, what should I do?"
Response: "First, pull over safely and turn on your hazard lights. If you have a spare tire, follow your car manual to replace it. Otherwise, call for roadside assistance. Stay in a safe location while waiting."
"""
system_message = """ You are a customer support agent specializing in vehicle-related issues like flat tires, engine problems, and maintenance tips.
# Instructions:
- Provide clear, easy-to-follow advice.
- Keep responses between 3 to 7 sentences.
- Offer safety recommendations where necessary.
- If a problem is complex, suggest visiting a professional mechanic.
# Example:
User: "My tire is punctured, what should I do?"
Response: "First, pull over safely and turn on your hazard lights. If you have a spare tire, follow your car manual to replace it. Otherwise, call for roadside assistance. Stay in a safe location while waiting."
"""
第 3 步:音訊文字處理
為了建立聽起來更自然的語音,我們實現了一個專門的 AudioTextProcessor 類,用於處理文字回復的分段:
- segment_text 方法使用正規表示式將長回覆分成自然的句子邊界。
- 這樣,TTS 引擎就能用適當的停頓和語調來處理每個句子。
- 其結果是語音模式更像人類,從而改善了使用者體驗。
class AudioTextProcessor:
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
臨時檔案管理
為了以簡潔、高效的方式處理音訊檔案,我們的增強型實現使用了 Python 的 tempfile 模組:
- 建立臨時檔案是為了在播放過程中儲存音訊資料。
- 每個音訊檔案在使用後都會被自動清理。
- 這樣可以防止系統中未使用檔案的積累,並有效地管理資源。
執行緒實現無阻塞音訊播放
在我們的新實施中,一個關鍵的改進是在音訊播放中使用了執行緒:
- 音訊響應在主程式之外的單獨執行緒中播放。
- 這樣,語音代理就可以在說話的同時繼續監聽和處理。
- 在播放過程中,麥克風會被靜音,以防止出現反饋迴路。
- 一個 threading.Event 物件 (
mic_muted
) 可以跨執行緒協調這一行為。
第 4 步:實現語音轉文字處理
為了理解使用者的命令,語音助手需要將語音轉換成文字。這是通過 Deepgram 的語音轉文字 API 實現的,它可以實時將語音轉錄為文字。它可以處理不同的語言和口音,並區分臨時(不完整)轉錄和最終(確認)轉錄。
在這一步驟中,首先從麥克風錄製音訊。然後,音訊被髮送到 Deepgram 的應用程式介面進行處理,並接收和儲存文字輸出以供進一步使用。
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, Microphone
deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY)
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Define event handler callbacks
def on_open(connection, event, **kwargs):
def on_message(connection, result, **kwargs):
# Ignore messages when microphone is muted for assistant's response
sentence = result.channel.alternatives[0].transcript
is_finals.append(sentence)
utterance = " ".join(is_finals)
print(f"User said: {utterance}")
# Process user input and generate response
def on_speech_started(connection, speech_started, **kwargs):
def on_utterance_end(connection, utterance_end, **kwargs):
utterance = " ".join(is_finals)
print(f"Utterance End: {utterance}")
def on_close(connection, close, **kwargs):
print("Connection Closed")
def on_error(connection, error, **kwargs):
print(f"Handled Error: {error}")
# Register event handlers
dg_connection.on(LiveTranscriptionEvents.Open, on_open)
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)
dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)
dg_connection.on(LiveTranscriptionEvents.Error, on_error)
# Configure live transcription options with advanced features
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, Microphone
import threading
# Initialize clients
deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY)
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Define event handler callbacks
def on_open(connection, event, **kwargs):
print("Connection Open")
def on_message(connection, result, **kwargs):
# Ignore messages when microphone is muted for assistant's response
if mic_muted.is_set():
return
sentence = result.channel.alternatives[0].transcript
if len(sentence) == 0:
return
if result.is_final:
is_finals.append(sentence)
if result.speech_final:
utterance = " ".join(is_finals)
print(f"User said: {utterance}")
is_finals.clear()
# Process user input and generate response
# [processing code here]
def on_speech_started(connection, speech_started, **kwargs):
print("Speech Started")
def on_utterance_end(connection, utterance_end, **kwargs):
if len(is_finals) > 0:
utterance = " ".join(is_finals)
print(f"Utterance End: {utterance}")
is_finals.clear()
def on_close(connection, close, **kwargs):
print("Connection Closed")
def on_error(connection, error, **kwargs):
print(f"Handled Error: {error}")
# Register event handlers
dg_connection.on(LiveTranscriptionEvents.Open, on_open)
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)
dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)
dg_connection.on(LiveTranscriptionEvents.Error, on_error)
# Configure live transcription options with advanced features
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
encoding="linear16",
channels=1,
sample_rate=16000,
interim_results=True,
utterance_end_ms="1000",
vad_events=True,
endpointing=500,
)
addons = {
"no_delay": "true"
}
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, Microphone
import threading
# Initialize clients
deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY)
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Define event handler callbacks
def on_open(connection, event, **kwargs):
print("Connection Open")
def on_message(connection, result, **kwargs):
# Ignore messages when microphone is muted for assistant's response
if mic_muted.is_set():
return
sentence = result.channel.alternatives[0].transcript
if len(sentence) == 0:
return
if result.is_final:
is_finals.append(sentence)
if result.speech_final:
utterance = " ".join(is_finals)
print(f"User said: {utterance}")
is_finals.clear()
# Process user input and generate response
# [processing code here]
def on_speech_started(connection, speech_started, **kwargs):
print("Speech Started")
def on_utterance_end(connection, utterance_end, **kwargs):
if len(is_finals) > 0:
utterance = " ".join(is_finals)
print(f"Utterance End: {utterance}")
is_finals.clear()
def on_close(connection, close, **kwargs):
print("Connection Closed")
def on_error(connection, error, **kwargs):
print(f"Handled Error: {error}")
# Register event handlers
dg_connection.on(LiveTranscriptionEvents.Open, on_open)
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)
dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)
dg_connection.on(LiveTranscriptionEvents.Error, on_error)
# Configure live transcription options with advanced features
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
encoding="linear16",
channels=1,
sample_rate=16000,
interim_results=True,
utterance_end_ms="1000",
vad_events=True,
endpointing=500,
)
addons = {
"no_delay": "true"
}
第 5 步:處理對話
助理將使用者的語音轉錄為文字後,需要對文字進行分析並生成適當的回覆。為此,我們使用了 OpenAI 的 o3-mini 模型,它可以理解之前資訊的上下文,並生成類似人類的回覆。它甚至還能記住對話歷史,幫助助手保持連續性。
在這一步中,助手會將使用者的詢問及其回覆儲存在對話列表中。然後,gpt-4o-mini 用於生成回覆,並作為助手的回覆返回。
# Initialize OpenAI client
openai_client = OpenAI(api_key=OPENAI_API_KEY)
def get_ai_response(user_input):
"""Get response from OpenAI API."""
# Add user message to conversation
conversation.append({"role": "user", "content": user_input.strip()})
# Prepare messages for API
messages = [{"role": "system", "content": system_message}]
messages.extend(conversation)
# Get response from OpenAI
chat_completion = openai_client.chat.completions.create(
# Extract and save assistant's response
response_text = chat_completion.choices[0].message.content.strip()
conversation.append({"role": "assistant", "content": response_text})
print(f"Error getting AI response: {e}")
return "I'm having trouble processing your request. Please try again."
# Initialize OpenAI client
openai_client = OpenAI(api_key=OPENAI_API_KEY)
def get_ai_response(user_input):
"""Get response from OpenAI API."""
try:
# Add user message to conversation
conversation.append({"role": "user", "content": user_input.strip()})
# Prepare messages for API
messages = [{"role": "system", "content": system_message}]
messages.extend(conversation)
# Get response from OpenAI
chat_completion = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.7,
max_tokens=150
)
# Extract and save assistant's response
response_text = chat_completion.choices[0].message.content.strip()
conversation.append({"role": "assistant", "content": response_text})
return response_text
except Exception as e:
print(f"Error getting AI response: {e}")
return "I'm having trouble processing your request. Please try again."
# Initialize OpenAI client
openai_client = OpenAI(api_key=OPENAI_API_KEY)
def get_ai_response(user_input):
"""Get response from OpenAI API."""
try:
# Add user message to conversation
conversation.append({"role": "user", "content": user_input.strip()})
# Prepare messages for API
messages = [{"role": "system", "content": system_message}]
messages.extend(conversation)
# Get response from OpenAI
chat_completion = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.7,
max_tokens=150
)
# Extract and save assistant's response
response_text = chat_completion.choices[0].message.content.strip()
conversation.append({"role": "assistant", "content": response_text})
return response_text
except Exception as e:
print(f"Error getting AI response: {e}")
return "I'm having trouble processing your request. Please try again."
第 6 步:將文字轉換為語音
助手應該大聲說出自己的迴應,而不僅僅是顯示文字。為此,需要使用 Deepgram 的文字轉語音 API 將文字轉換為自然語音。
在這一步中,助手的文字回覆被傳送到 Deepgram 的 API,由其進行處理並返回語音的音訊檔案。最後,使用 Python 的 Pygame 庫播放音訊檔案,讓助手向使用者說出自己的迴應。
class AudioTextProcessor:
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
def generate_audio(text, headers):
"""Generate audio using Deepgram TTS API."""
with requests.post(DEEPGRAM_TTS_URL, stream=True, headers=headers, json=payload) as r:
except requests.exceptions.RequestException as e:
print(f"Error generating audio: {e}")
def play_audio(file_path):
"""Play audio file using pygame."""
pygame.mixer.music.load(file_path)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# Stop the mixer and release resources
pygame.mixer.music.stop()
print(f"Error playing audio: {e}")
# Signal that playback is finished
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
@staticmethod
def generate_audio(text, headers):
"""Generate audio using Deepgram TTS API."""
payload = {"text": text}
try:
with requests.post(DEEPGRAM_TTS_URL, stream=True, headers=headers, json=payload) as r:
r.raise_for_status()
return r.content
except requests.exceptions.RequestException as e:
print(f"Error generating audio: {e}")
return None
def play_audio(file_path):
"""Play audio file using pygame."""
try:
pygame.mixer.init()
pygame.mixer.music.load(file_path)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# Stop the mixer and release resources
pygame.mixer.music.stop()
pygame.mixer.quit()
except Exception as e:
print(f"Error playing audio: {e}")
finally:
# Signal that playback is finished
mic_muted.clear()
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
@staticmethod
def generate_audio(text, headers):
"""Generate audio using Deepgram TTS API."""
payload = {"text": text}
try:
with requests.post(DEEPGRAM_TTS_URL, stream=True, headers=headers, json=payload) as r:
r.raise_for_status()
return r.content
except requests.exceptions.RequestException as e:
print(f"Error generating audio: {e}")
return None
def play_audio(file_path):
"""Play audio file using pygame."""
try:
pygame.mixer.init()
pygame.mixer.music.load(file_path)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# Stop the mixer and release resources
pygame.mixer.music.stop()
pygame.mixer.quit()
except Exception as e:
print(f"Error playing audio: {e}")
finally:
# Signal that playback is finished
mic_muted.clear()
第 7 步:歡迎和告別資訊
精心設計的語音代理可以在啟動時向使用者問好,並在退出時提供告別資訊,從而創造更具吸引力的互動體驗。這有助於建立友好的基調,確保互動順利結束。
def generate_welcome_message():
""" Generate welcome message audio."""
welcome_msg = "Hello, I'm Eric, your vehicle support assistant. How can I help with your vehicle today?"
# Create temporary file for welcome message
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as welcome_file:
welcome_path = welcome_file.name
# Generate audio for welcome message
welcome_audio = audio_processor.generate_audio(welcome_msg, DEEPGRAM_HEADERS)
with open(welcome_path, "wb") as f:
threading.Thread(target=play_audio, args=(welcome_path,)).start()
def generate_welcome_message():
""" Generate welcome message audio."""
welcome_msg = "Hello, I'm Eric, your vehicle support assistant. How can I help with your vehicle today?"
# Create temporary file for welcome message
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as welcome_file:
welcome_path = welcome_file.name
# Generate audio for welcome message
welcome_audio = audio_processor.generate_audio(welcome_msg, DEEPGRAM_HEADERS)
if welcome_audio:
with open(welcome_path, "wb") as f:
f.write(welcome_audio)
# Play welcome message
mic_muted.set()
threading.Thread(target=play_audio, args=(welcome_path,)).start()
return welcome_path
def generate_welcome_message():
""" Generate welcome message audio."""
welcome_msg = "Hello, I'm Eric, your vehicle support assistant. How can I help with your vehicle today?"
# Create temporary file for welcome message
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as welcome_file:
welcome_path = welcome_file.name
# Generate audio for welcome message
welcome_audio = audio_processor.generate_audio(welcome_msg, DEEPGRAM_HEADERS)
if welcome_audio:
with open(welcome_path, "wb") as f:
f.write(welcome_audio)
# Play welcome message
mic_muted.set()
threading.Thread(target=play_audio, args=(welcome_path,)).start()
return welcome_path
麥克風管理
一個關鍵的改進是在通話過程中對麥克風進行適當管理:
- 代理髮言時,麥克風會自動靜音。
- 這可以防止代理“聽到”自己的聲音。
- 執行緒事件物件可線上程之間協調這一行為。
# Mute microphone and play response
threading.Thread(target=play_audio, args=(temp_path,)).start()
# Mute microphone and play response
mic_muted.set()
microphone.mute()
threading.Thread(target=play_audio, args=(temp_path,)).start()
time.sleep(0.2)
microphone.unmute()
# Mute microphone and play response
mic_muted.set()
microphone.mute()
threading.Thread(target=play_audio, args=(temp_path,)).start()
time.sleep(0.2)
microphone.unmute()
第 8 步:退出命令
為確保流暢直觀的使用者互動,語音代理會監聽常見的退出命令,如“退出”、“退下”、“再見”或“拜拜”。檢測到退出命令時,系統會確認並安全關閉。
# Check for exit commands
if any(exit_cmd in utterance.lower() for exit_cmd in ["exit", "quit", "goodbye", "bye"]):
print("Exit command detected. Shutting down...")
farewell_text = "Thank you for using the vehicle support assistant. Goodbye!"
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file:
temp_path = temp_file.name
farewell_audio = audio_processor.generate_audio(farewell_text, DEEPGRAM_HEADERS)
with open(temp_path, "wb") as f:
# Mute microphone and play farewell
if os.path.exists(temp_path):
# Check for exit commands
if any(exit_cmd in utterance.lower() for exit_cmd in ["exit", "quit", "goodbye", "bye"]):
print("Exit command detected. Shutting down...")
farewell_text = "Thank you for using the vehicle support assistant. Goodbye!"
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file:
temp_path = temp_file.name
farewell_audio = audio_processor.generate_audio(farewell_text, DEEPGRAM_HEADERS)
if farewell_audio:
with open(temp_path, "wb") as f:
f.write(farewell_audio)
# Mute microphone and play farewell
mic_muted.set()
microphone.mute()
play_audio(temp_path)
time.sleep(0.2)
# Clean up and exit
if os.path.exists(temp_path):
os.remove(temp_path)
# End the program
os._exit(0)
# Check for exit commands
if any(exit_cmd in utterance.lower() for exit_cmd in ["exit", "quit", "goodbye", "bye"]):
print("Exit command detected. Shutting down...")
farewell_text = "Thank you for using the vehicle support assistant. Goodbye!"
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file:
temp_path = temp_file.name
farewell_audio = audio_processor.generate_audio(farewell_text, DEEPGRAM_HEADERS)
if farewell_audio:
with open(temp_path, "wb") as f:
f.write(farewell_audio)
# Mute microphone and play farewell
mic_muted.set()
microphone.mute()
play_audio(temp_path)
time.sleep(0.2)
# Clean up and exit
if os.path.exists(temp_path):
os.remove(temp_path)
# End the program
os._exit(0)
第 9 步:錯誤處理和穩定性
為確保無縫和彈性的使用者體驗,語音代理必須從容應對錯誤。如果處理不當,網路故障、音訊響應缺失或使用者輸入無效等意外問題可能會中斷互動。
異常處理
整個程式碼都使用 Try-except 塊來捕獲和優雅地處理錯誤:
- 在音訊生成和播放功能中。
- 在與 OpenAI 和 Deepgram 的 API 互動過程中。
- 在主事件處理迴圈中。
# Generate audio for each segment
with open(temp_path, "wb") as output_file:
for segment_text in text_segments:
audio_data = audio_processor.generate_audio(segment_text, DEEPGRAM_HEADERS)
output_file.write(audio_data)
print(f"Error generating or playing audio: {e}")
try:
# Generate audio for each segment
with open(temp_path, "wb") as output_file:
for segment_text in text_segments:
audio_data = audio_processor.generate_audio(segment_text, DEEPGRAM_HEADERS)
if audio_data:
output_file.write(audio_data)
except Exception as e:
print(f"Error generating or playing audio: {e}")
try:
# Generate audio for each segment
with open(temp_path, "wb") as output_file:
for segment_text in text_segments:
audio_data = audio_processor.generate_audio(segment_text, DEEPGRAM_HEADERS)
if audio_data:
output_file.write(audio_data)
except Exception as e:
print(f"Error generating or playing audio: {e}")
資源清理
適當的資源管理對可靠的應用程式至關重要:
- 使用後刪除臨時檔案。
- 正確釋放 Pygame 音訊資源。
- 退出時關閉麥克風和連線物件。
if os.path.exists(welcome_file):
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
第 10 步:執行語音助理的最後步驟
我們需要一個主函式將所有功能串聯起來,確保語音助手順利執行。主功能將
- 聆聽使用者的語音。
- 將語音轉換為文字,並使用人工智慧生成回覆、
- 將回復轉換成語音,然後將語音播放給使用者。
- 這一過程可確保助手與使用者進行完整、無縫的互動。
"""Main function to run the voice assistant."""
print("Starting Vehicle Support Voice Assistant 'Eric'...")
print("Speak after the welcome message.")
print("\nPress Enter to stop the assistant...\n")
# Generate and play welcome message
welcome_file = generate_welcome_message()
time.sleep(0.5) # Give time for welcome message to start
# Initialize is_finalslist to store transcription segments
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Register event handlers
# [event registration code here]
# Configure and start Deepgram connection
if not dg_connection.start(options, addons=addons):
print("Failed to connect to Deepgram")
microphone = Microphone(dg_connection.send)
# Wait for user to press Enter to stop
if os.path.exists(welcome_file):
print("Assistant stopped.")
if __name__ == "__main__":
def main():
"""Main function to run the voice assistant."""
print("Starting Vehicle Support Voice Assistant 'Eric'...")
print("Speak after the welcome message.")
print("\nPress Enter to stop the assistant...\n")
# Generate and play welcome message
welcome_file = generate_welcome_message()
time.sleep(0.5) # Give time for welcome message to start
try:
# Initialize is_finalslist to store transcription segments
is_finals = []
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Register event handlers
# [event registration code here]
# Configure and start Deepgram connection
if not dg_connection.start(options, addons=addons):
print("Failed to connect to Deepgram")
return
# Start microphone
microphone = Microphone(dg_connection.send)
microphone.start()
# Wait for user to press Enter to stop
input("")
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
print("Assistant stopped.")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
def main():
"""Main function to run the voice assistant."""
print("Starting Vehicle Support Voice Assistant 'Eric'...")
print("Speak after the welcome message.")
print("\nPress Enter to stop the assistant...\n")
# Generate and play welcome message
welcome_file = generate_welcome_message()
time.sleep(0.5) # Give time for welcome message to start
try:
# Initialize is_finalslist to store transcription segments
is_finals = []
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Register event handlers
# [event registration code here]
# Configure and start Deepgram connection
if not dg_connection.start(options, addons=addons):
print("Failed to connect to Deepgram")
return
# Start microphone
microphone = Microphone(dg_connection.send)
microphone.start()
# Wait for user to press Enter to stop
input("")
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
print("Assistant stopped.")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
有關程式碼的完整版本,請參閱此處。
注:由於我們目前使用的是 Deepgram 的免費版本,受免費計劃的限制,代理的響應時間往往較慢。
語音代理使用案例
1. 客戶支援自動化
例項:
- 銀行與金融:回答有關賬戶餘額、交易或信用卡賬單的詢問。
- 電子商務:提供訂單狀態、退貨政策或產品建議。
- 航空與旅行:協助處理航班預訂、取消和行李政策。
對話示例:
- 使用者:我的訂單什麼時候發貨?
- 代理:您的訂單已於 2 月 17 日發貨,預計將於 2 月 20 日到達。
2. 醫療保健虛擬助理
示例:
- 醫院和診所:預約醫生。
- 家庭護理:提醒老年患者服藥。
- 遠端醫療:在連線醫生之前提供基本症狀分析。
對話示例
- 使用者:我頭痛發燒。我該怎麼辦?
- 代理:根據您的症狀,您可能有輕微發燒。請補充水分並注意休息。如果症狀持續,請諮詢醫生。
3. 車輛語音助手
例如:導航
- 導航:查詢最近的加油站。
- 音樂控制:播放我的公路旅行播放列表。
- 緊急幫助:呼叫道路救援。
對話示例
- 使用者:我的路線交通如何?
- 代理: 交通狀況一般。預計到達時間為 45 分鐘。
小結
語音代理使互動變得自然、高效和易於使用,從而徹底改變了溝通方式。語音代理在客戶支援、智慧家居、醫療保健和金融等行業有著多種多樣的應用案例。
通過利用語音到文字、文字到語音和 NLP,它們可以理解上下文,提供智慧響應,並無縫處理複雜任務。隨著人工智慧的發展,這些系統將變得更加個性化和人性化,它們從互動中學習的能力將使它們能夠提供越來越多的定製和直觀體驗,使它們成為個人和職業環境中不可或缺的夥伴。
常見問題
Q1. 什麼是語音代理?
A. 語音代理是一種人工智慧驅動的系統,可以處理語音、理解上下文,並利用語音到文字、NLP 和文字到語音技術做出智慧響應。
Q2. 語音代理有哪些關鍵元件?
A. 主要包括:- 語音到文字(STT): 自然語言處理(NLP): 自然語言處理(NLP):理解並處理輸入: 文字到語音(TTS):將文字回復轉換為類似人類的語音: 人工智慧模型:生成有意義且能感知上下文的回覆。
Q3. 語音代理用於哪些領域?
答:語音代理廣泛應用於客戶服務、醫療保健、虛擬助手、智慧家居、銀行業務、汽車支援和無障礙解決方案。
Q4. 語音代理能聽懂不同的語言和口音嗎?
A. 可以,許多先進的語音代理都支援多種語言和口音,可在全球範圍內改善無障礙環境和使用者體驗。
Q5. 語音代理是否會取代人工支援代理?
A. 不是,設計語音代理的目的是通過處理重複性任務來協助和加強人工座席,使人工座席能夠集中精力處理複雜問題。
評論留言