如何创建客户支持AI语音代理

如何创建客户支持AI语音代理

在当今快节奏的数字世界中,企业不断寻求创新方法来提高客户参与度和简化支持服务。一个有效的解决方案就是使用人工智能驱动的客户支持语音代理。这些人工智能语音机器人能够实时理解和响应基于语音的客户支持查询。它们利用会话式人工智能实现互动自动化,减少等待时间,提高客户支持效率。在本文中,我们将全面了解人工智能语音客服代理,并学习如何使用 Deepgram 和 pygame 库构建语音客服代理。

什么是语音代理?

语音代理是一种人工智能驱动的代理,旨在通过语音通信与用户互动。它能理解口语、处理请求并生成类似人类的响应。它能实现无缝的语音交互,减少人工输入的需要,提升用户体验。与仅依赖文本输入的传统聊天机器人不同,语音代理可实现免提、实时对话。这使其成为一种更自然、更高效的技术交互方式。

语音代理与传统聊天机器人的区别

功能 语音代理 传统聊天机器人
输入方式 语音 文本
响应方式 语音 文本
免提使用
响应时间 更快,实时 稍有延迟,取决于打字速度
理解口音  高级(因模型而异) 不适用
多模态功能 可整合文本和语音 主要基于文本
语境保留 更高,可记住过去的交互内容 不同,通常仅限于文本历史记录
用户体验 更自然 要求输入

语音代理的关键要素

语音代理是一种人工智能驱动的系统,可促进基于语音的互动,通常用于客户支持、虚拟助理和自动呼叫中心。它利用语音识别、自然语言处理(NLP)和文本到语音技术来理解用户的询问并提供适当的回复。

在本节中,我们将探讨语音代理的关键组件,以实现无缝、高效的语音通信。

语音代理的关键要素

1. 自动语音识别(ASR)–语音到文本的转换

语音代理工作流程的第一步是将口语转换为文本。这是通过自动语音识别(ASR)实现的。

代码实现:

  1. Deepgram API 用于实时语音转录。
  2. deepgram.listen.live.v(“1”) 方法可捕获实时音频并将其转录为文本。
  3. 事件 LiveTranscriptionEvents.Transcript 处理并提取语音。

2. 自然语言处理 (NLP) – 理解用户意图

将语音转录为文本后,系统需要对其进行处理和理解。这里使用 OpenAI 的 LLM(GPT 模型)进行自然语言理解(NLU)。

代码实现:

  1. 转录文本被添加到“对话”列表中。
  2. GPT 模型(o3-mini-2025-01-31)处理信息,生成智能回复。
  3. 系统消息(“system_message” )定义了代理的个性和范围。

3. 文本到语音 (TTS) – 生成音频回复

系统生成回复后,需要将其转换为语音,以获得自然的对话体验。Deepgram 的 Aura Helios TTS 模型用于生成语音。

代码实现:

  1. generate_audio() 函数将生成的文本发送到 Deepgram 的 TTS API(”DEEPGRAM_URL”)。
  2. 响应是一个音频文件,然后使用 “pygame.mixer” 播放。

4. 实时音频处理和播放

对于实时语音代理来说,生成的语音必须在处理后立即播放。Pygame 的混音器模块用于处理音频播放。

代码实现:

  1. playaudio()  函数使用 “pygame.mixer” 播放生成的音频。
  2. 在播放响应时,麦克风会被静音,以防止意外的音频干扰。

5. 事件处理和对话流程

实时语音代理需要处理多个事件,如打开和关闭连接、处理语音和处理错误。

代码实现:

  1. 为处理 ASR(”on_message”)、语篇检测(”on_utterance_end”)和错误(”on_error”)注册了事件监听器。
  2. 该系统可确保顺利处理用户输入和服务器响应。

6. 麦克风输入和语音控制

语音代理的一个关键方面是使用麦克风捕捉实时用户输入。Deepgram 麦克风模块用于实时音频流。

代码实现:

  • 麦克风持续监听并发送音频数据供 ASR 处理。
  • 系统可以在播放回复时将麦克风静音/取消静音。

访问API密钥

在开始构建语音代理的步骤之前,我们先来看看如何生成所需的 API 密钥。

1. Deepgram API密钥

要获取 Deepgram API 密钥,请访问 Deepgram 并注册 Deepgram 账户。如果您已有账户,只需登录即可。

登录后,点击 Create API Key 生成新密钥。Deepgram 还提供 200 美元的免费点数,用户无需支付初始费用即可体验其服务。

Deepgram API密钥

2. OpenAI API密钥

要访问 OpenAI API 密钥,请访问 OpenAI 并登录账户。如果还没有 OpenAI 账户,请注册一个。

登录后,点击 Create new secret key 生成新密钥。

OpenAI API密钥

创建语音代理的步骤

现在我们准备好创建一个语音代理了。在本指南中,我们将学习如何创建一个客户支持语音代理,以自然、直观的方式帮助用户完成任务、回答疑问并提供个性化帮助。那么,让我们开始吧。

第 1 步:设置API密钥

API 可帮助我们连接外部服务,如语音识别或文本生成。为了确保只有授权用户才能使用这些服务,我们需要使用 API 密钥进行身份验证。为了安全起见,最好将密钥存储在单独的文本文件或环境变量中。这样,程序就能在必要时安全地读取和加载密钥。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
With open ("deepgram_apikey_path","r") as f:
API_KEY = f.read().strip()
with open("/openai_apikey_path","r") as f:
OPENAI_API_KEY = f.read().strip()
load_dotenv()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
With open ("deepgram_apikey_path","r") as f: API_KEY = f.read().strip() with open("/openai_apikey_path","r") as f: OPENAI_API_KEY = f.read().strip() load_dotenv() os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
With open ("deepgram_apikey_path","r") as f:
API_KEY = f.read().strip()
with open("/openai_apikey_path","r") as f:
OPENAI_API_KEY = f.read().strip()
load_dotenv()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

第 2 步:定义系统指令

语音助理必须遵循明确的指导原则,以确保能提供有帮助、有条理的回复。这些规则定义了语音助理的角色,例如它是客户支持还是个人助理。这些规则还规定了回复的语气和风格,例如是正式、随意还是专业。您甚至可以设定回复的详细程度或简洁程度。

在这一步中,您要编写一条系统消息,解释代理的目的,并包含对话示例,以帮助生成更准确、更相关的回复。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
system_message = """ You are a customer support agent specializing in vehicle-related issues like flat tires, engine problems, and maintenance tips.
# Instructions:
- Provide clear, easy-to-follow advice.
- Keep responses between 3 to 7 sentences.
- Offer safety recommendations where necessary.
- If a problem is complex, suggest visiting a professional mechanic.
# Example:
User: "My tire is punctured, what should I do?"
Response: "First, pull over safely and turn on your hazard lights. If you have a spare tire, follow your car manual to replace it. Otherwise, call for roadside assistance. Stay in a safe location while waiting."
"""
system_message = """ You are a customer support agent specializing in vehicle-related issues like flat tires, engine problems, and maintenance tips. # Instructions: - Provide clear, easy-to-follow advice. - Keep responses between 3 to 7 sentences. - Offer safety recommendations where necessary. - If a problem is complex, suggest visiting a professional mechanic. # Example: User: "My tire is punctured, what should I do?" Response: "First, pull over safely and turn on your hazard lights. If you have a spare tire, follow your car manual to replace it. Otherwise, call for roadside assistance. Stay in a safe location while waiting." """
system_message = """ You are a customer support agent specializing in vehicle-related issues like flat tires, engine problems, and maintenance tips.
# Instructions: 
- Provide clear, easy-to-follow advice. 
- Keep responses between 3 to 7 sentences. 
- Offer safety recommendations where necessary.
- If a problem is complex, suggest visiting a professional mechanic. 
# Example: 
User: "My tire is punctured, what should I do?" 
Response: "First, pull over safely and turn on your hazard lights. If you have a spare tire, follow your car manual to replace it. Otherwise, call for roadside assistance. Stay in a safe location while waiting." 
"""

第 3 步:音频文本处理

为了创建听起来更自然的语音,我们实现了一个专门的 AudioTextProcessor 类,用于处理文本回复的分段:

  • segment_text 方法使用正则表达式将长回复分成自然的句子边界。
  • 这样,TTS 引擎就能用适当的停顿和语调来处理每个句子。
  • 其结果是语音模式更像人类,从而改善了用户体验。
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
class AudioTextProcessor: @staticmethod def segment_text(text): """Split text into segments at sentence boundaries for better TTS.""" sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text) boundaries_indices = [boundary.start() for boundary in sentence_boundaries] segments = [] start = 0 for boundary_index in boundaries_indices: segments.append(text[start:boundary_index + 1].strip()) start = boundary_index + 1 segments.append(text[start:].strip()) return segments
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments

临时文件管理

为了以简洁、高效的方式处理音频文件,我们的增强型实现使用了 Python 的 tempfile 模块:

  • 创建临时文件是为了在播放过程中存储音频数据。
  • 每个音频文件在使用后都会被自动清理。
  • 这样可以防止系统中未使用文件的积累,并有效地管理资源。

线程实现无阻塞音频播放

在我们的新实施中,一个关键的改进是在音频播放中使用了线程:

  • 音频响应在主程序之外的单独线程中播放。
  • 这样,语音代理就可以在说话的同时继续监听和处理。
  • 在播放过程中,麦克风会被静音,以防止出现反馈回路。
  • 一个 threading.Event 对象 (mic_muted) 可以跨线程协调这一行为。

第 4 步:实现语音转文本处理

为了理解用户的命令,语音助手需要将语音转换成文本。这是通过 Deepgram 的语音转文本 API 实现的,它可以实时将语音转录为文本。它可以处理不同的语言和口音,并区分临时(不完整)转录和最终(确认)转录。

在这一步骤中,首先从麦克风录制音频。然后,音频被发送到 Deepgram 的应用程序接口进行处理,并接收和存储文本输出以供进一步使用。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, Microphone
import threading
# Initialize clients
deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY)
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Define event handler callbacks
def on_open(connection, event, **kwargs):
print("Connection Open")
def on_message(connection, result, **kwargs):
# Ignore messages when microphone is muted for assistant's response
if mic_muted.is_set():
return
sentence = result.channel.alternatives[0].transcript
if len(sentence) == 0:
return
if result.is_final:
is_finals.append(sentence)
if result.speech_final:
utterance = " ".join(is_finals)
print(f"User said: {utterance}")
is_finals.clear()
# Process user input and generate response
# [processing code here]
def on_speech_started(connection, speech_started, **kwargs):
print("Speech Started")
def on_utterance_end(connection, utterance_end, **kwargs):
if len(is_finals) > 0:
utterance = " ".join(is_finals)
print(f"Utterance End: {utterance}")
is_finals.clear()
def on_close(connection, close, **kwargs):
print("Connection Closed")
def on_error(connection, error, **kwargs):
print(f"Handled Error: {error}")
# Register event handlers
dg_connection.on(LiveTranscriptionEvents.Open, on_open)
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)
dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)
dg_connection.on(LiveTranscriptionEvents.Error, on_error)
# Configure live transcription options with advanced features
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
encoding="linear16",
channels=1,
sample_rate=16000,
interim_results=True,
utterance_end_ms="1000",
vad_events=True,
endpointing=500,
)
addons = {
"no_delay": "true"
}
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, Microphone import threading # Initialize clients deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY) # Set up Deepgram connection dg_connection = deepgram_client.listen.websocket.v("1") # Define event handler callbacks def on_open(connection, event, **kwargs): print("Connection Open") def on_message(connection, result, **kwargs): # Ignore messages when microphone is muted for assistant's response if mic_muted.is_set(): return sentence = result.channel.alternatives[0].transcript if len(sentence) == 0: return if result.is_final: is_finals.append(sentence) if result.speech_final: utterance = " ".join(is_finals) print(f"User said: {utterance}") is_finals.clear() # Process user input and generate response # [processing code here] def on_speech_started(connection, speech_started, **kwargs): print("Speech Started") def on_utterance_end(connection, utterance_end, **kwargs): if len(is_finals) > 0: utterance = " ".join(is_finals) print(f"Utterance End: {utterance}") is_finals.clear() def on_close(connection, close, **kwargs): print("Connection Closed") def on_error(connection, error, **kwargs): print(f"Handled Error: {error}") # Register event handlers dg_connection.on(LiveTranscriptionEvents.Open, on_open) dg_connection.on(LiveTranscriptionEvents.Transcript, on_message) dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started) dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end) dg_connection.on(LiveTranscriptionEvents.Close, on_close) dg_connection.on(LiveTranscriptionEvents.Error, on_error) # Configure live transcription options with advanced features options = LiveOptions( model="nova-2", language="en-US", smart_format=True, encoding="linear16", channels=1, sample_rate=16000, interim_results=True, utterance_end_ms="1000", vad_events=True, endpointing=500, ) addons = { "no_delay": "true" }
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, Microphone
import threading
# Initialize clients
deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY)
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Define event handler callbacks
def on_open(connection, event, **kwargs):
print("Connection Open")
def on_message(connection, result, **kwargs):
# Ignore messages when microphone is muted for assistant's response
if mic_muted.is_set():
return
sentence = result.channel.alternatives[0].transcript
if len(sentence) == 0:
return
if result.is_final:
is_finals.append(sentence)
if result.speech_final:
utterance = " ".join(is_finals)
print(f"User said: {utterance}")
is_finals.clear()
# Process user input and generate response
# [processing code here]
def on_speech_started(connection, speech_started, **kwargs):
print("Speech Started")
def on_utterance_end(connection, utterance_end, **kwargs):
if len(is_finals) > 0:
utterance = " ".join(is_finals)
print(f"Utterance End: {utterance}")
is_finals.clear()
def on_close(connection, close, **kwargs):
print("Connection Closed")
def on_error(connection, error, **kwargs):
print(f"Handled Error: {error}")
# Register event handlers
dg_connection.on(LiveTranscriptionEvents.Open, on_open)
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)
dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)
dg_connection.on(LiveTranscriptionEvents.Error, on_error)
# Configure live transcription options with advanced features
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
encoding="linear16",
channels=1,
sample_rate=16000,
interim_results=True,
utterance_end_ms="1000",
vad_events=True,
endpointing=500,
)
addons = {
"no_delay": "true"
}

第 5 步:处理对话

助理将用户的语音转录为文本后,需要对文本进行分析并生成适当的回复。为此,我们使用了 OpenAI 的 o3-mini 模型,它可以理解之前信息的上下文,并生成类似人类的回复。它甚至还能记住对话历史,帮助助手保持连续性。

在这一步中,助手会将用户的询问及其回复存储在对话列表中。然后,gpt-4o-mini 用于生成回复,并作为助手的回复返回。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Initialize OpenAI client
openai_client = OpenAI(api_key=OPENAI_API_KEY)
def get_ai_response(user_input):
"""Get response from OpenAI API."""
try:
# Add user message to conversation
conversation.append({"role": "user", "content": user_input.strip()})
# Prepare messages for API
messages = [{"role": "system", "content": system_message}]
messages.extend(conversation)
# Get response from OpenAI
chat_completion = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.7,
max_tokens=150
)
# Extract and save assistant's response
response_text = chat_completion.choices[0].message.content.strip()
conversation.append({"role": "assistant", "content": response_text})
return response_text
except Exception as e:
print(f"Error getting AI response: {e}")
return "I'm having trouble processing your request. Please try again."
# Initialize OpenAI client openai_client = OpenAI(api_key=OPENAI_API_KEY) def get_ai_response(user_input): """Get response from OpenAI API.""" try: # Add user message to conversation conversation.append({"role": "user", "content": user_input.strip()}) # Prepare messages for API messages = [{"role": "system", "content": system_message}] messages.extend(conversation) # Get response from OpenAI chat_completion = openai_client.chat.completions.create( model="gpt-4o-mini", messages=messages, temperature=0.7, max_tokens=150 ) # Extract and save assistant's response response_text = chat_completion.choices[0].message.content.strip() conversation.append({"role": "assistant", "content": response_text}) return response_text except Exception as e: print(f"Error getting AI response: {e}") return "I'm having trouble processing your request. Please try again."
# Initialize OpenAI client
openai_client = OpenAI(api_key=OPENAI_API_KEY)
def get_ai_response(user_input):
"""Get response from OpenAI API."""
try:
# Add user message to conversation
conversation.append({"role": "user", "content": user_input.strip()})
# Prepare messages for API
messages = [{"role": "system", "content": system_message}]
messages.extend(conversation)
# Get response from OpenAI
chat_completion = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.7,
max_tokens=150
)
# Extract and save assistant's response
response_text = chat_completion.choices[0].message.content.strip()
conversation.append({"role": "assistant", "content": response_text})
return response_text
except Exception as e:
print(f"Error getting AI response: {e}")
return "I'm having trouble processing your request. Please try again."

第 6 步:将文本转换为语音

助手应该大声说出自己的回应,而不仅仅是显示文本。为此,需要使用 Deepgram 的文本转语音 API 将文本转换为自然语音。

在这一步中,助手的文本回复被发送到 Deepgram 的 API,由其进行处理并返回语音的音频文件。最后,使用 Python 的 Pygame 库播放音频文件,让助手向用户说出自己的回应。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
@staticmethod
def generate_audio(text, headers):
"""Generate audio using Deepgram TTS API."""
payload = {"text": text}
try:
with requests.post(DEEPGRAM_TTS_URL, stream=True, headers=headers, json=payload) as r:
r.raise_for_status()
return r.content
except requests.exceptions.RequestException as e:
print(f"Error generating audio: {e}")
return None
def play_audio(file_path):
"""Play audio file using pygame."""
try:
pygame.mixer.init()
pygame.mixer.music.load(file_path)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# Stop the mixer and release resources
pygame.mixer.music.stop()
pygame.mixer.quit()
except Exception as e:
print(f"Error playing audio: {e}")
finally:
# Signal that playback is finished
mic_muted.clear()
class AudioTextProcessor: @staticmethod def segment_text(text): """Split text into segments at sentence boundaries for better TTS.""" sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text) boundaries_indices = [boundary.start() for boundary in sentence_boundaries] segments = [] start = 0 for boundary_index in boundaries_indices: segments.append(text[start:boundary_index + 1].strip()) start = boundary_index + 1 segments.append(text[start:].strip()) return segments @staticmethod def generate_audio(text, headers): """Generate audio using Deepgram TTS API.""" payload = {"text": text} try: with requests.post(DEEPGRAM_TTS_URL, stream=True, headers=headers, json=payload) as r: r.raise_for_status() return r.content except requests.exceptions.RequestException as e: print(f"Error generating audio: {e}") return None def play_audio(file_path): """Play audio file using pygame.""" try: pygame.mixer.init() pygame.mixer.music.load(file_path) pygame.mixer.music.play() while pygame.mixer.music.get_busy(): pygame.time.Clock().tick(10) # Stop the mixer and release resources pygame.mixer.music.stop() pygame.mixer.quit() except Exception as e: print(f"Error playing audio: {e}") finally: # Signal that playback is finished mic_muted.clear()
class AudioTextProcessor:
@staticmethod
def segment_text(text):
"""Split text into segments at sentence boundaries for better TTS."""
sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
segments = []
start = 0
for boundary_index in boundaries_indices:
segments.append(text[start:boundary_index + 1].strip())
start = boundary_index + 1
segments.append(text[start:].strip())
return segments
@staticmethod
def generate_audio(text, headers):
"""Generate audio using Deepgram TTS API."""
payload = {"text": text}
try:
with requests.post(DEEPGRAM_TTS_URL, stream=True, headers=headers, json=payload) as r:
r.raise_for_status()
return r.content
except requests.exceptions.RequestException as e:
print(f"Error generating audio: {e}")
return None
def play_audio(file_path):
"""Play audio file using pygame."""
try:
pygame.mixer.init()
pygame.mixer.music.load(file_path)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# Stop the mixer and release resources
pygame.mixer.music.stop()
pygame.mixer.quit()
except Exception as e:
print(f"Error playing audio: {e}")
finally:
# Signal that playback is finished
mic_muted.clear()

第 7 步:欢迎和告别信息

精心设计的语音代理可以在启动时向用户问好,并在退出时提供告别信息,从而创造更具吸引力的互动体验。这有助于建立友好的基调,确保交互顺利结束。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def generate_welcome_message():
""" Generate welcome message audio."""
welcome_msg = "Hello, I'm Eric, your vehicle support assistant. How can I help with your vehicle today?"
# Create temporary file for welcome message
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as welcome_file:
welcome_path = welcome_file.name
# Generate audio for welcome message
welcome_audio = audio_processor.generate_audio(welcome_msg, DEEPGRAM_HEADERS)
if welcome_audio:
with open(welcome_path, "wb") as f:
f.write(welcome_audio)
# Play welcome message
mic_muted.set()
threading.Thread(target=play_audio, args=(welcome_path,)).start()
return welcome_path
def generate_welcome_message(): """ Generate welcome message audio.""" welcome_msg = "Hello, I'm Eric, your vehicle support assistant. How can I help with your vehicle today?" # Create temporary file for welcome message with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as welcome_file: welcome_path = welcome_file.name # Generate audio for welcome message welcome_audio = audio_processor.generate_audio(welcome_msg, DEEPGRAM_HEADERS) if welcome_audio: with open(welcome_path, "wb") as f: f.write(welcome_audio) # Play welcome message mic_muted.set() threading.Thread(target=play_audio, args=(welcome_path,)).start() return welcome_path
def generate_welcome_message():
""" Generate welcome message audio."""
welcome_msg = "Hello, I'm Eric, your vehicle support assistant. How can I help with your vehicle today?"
# Create temporary file for welcome message
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as welcome_file:
welcome_path = welcome_file.name
# Generate audio for welcome message
welcome_audio = audio_processor.generate_audio(welcome_msg, DEEPGRAM_HEADERS)
if welcome_audio:
with open(welcome_path, "wb") as f:
f.write(welcome_audio)
# Play welcome message
mic_muted.set()
threading.Thread(target=play_audio, args=(welcome_path,)).start()
return welcome_path

麦克风管理

一个关键的改进是在通话过程中对麦克风进行适当管理:

  • 代理发言时,麦克风会自动静音。
  • 这可以防止代理“听到”自己的声音。
  • 线程事件对象可在线程之间协调这一行为。
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Mute microphone and play response
mic_muted.set()
microphone.mute()
threading.Thread(target=play_audio, args=(temp_path,)).start()
time.sleep(0.2)
microphone.unmute()
# Mute microphone and play response mic_muted.set() microphone.mute() threading.Thread(target=play_audio, args=(temp_path,)).start() time.sleep(0.2) microphone.unmute()
# Mute microphone and play response
mic_muted.set()
microphone.mute()
threading.Thread(target=play_audio, args=(temp_path,)).start()
time.sleep(0.2)
microphone.unmute()

第 8 步:退出命令

为确保流畅直观的用户交互,语音代理会监听常见的退出命令,如“退出”、“退下”、“再见”或“拜拜”。检测到退出命令时,系统会确认并安全关闭。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Check for exit commands
if any(exit_cmd in utterance.lower() for exit_cmd in ["exit", "quit", "goodbye", "bye"]):
print("Exit command detected. Shutting down...")
farewell_text = "Thank you for using the vehicle support assistant. Goodbye!"
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file:
temp_path = temp_file.name
farewell_audio = audio_processor.generate_audio(farewell_text, DEEPGRAM_HEADERS)
if farewell_audio:
with open(temp_path, "wb") as f:
f.write(farewell_audio)
# Mute microphone and play farewell
mic_muted.set()
microphone.mute()
play_audio(temp_path)
time.sleep(0.2)
# Clean up and exit
if os.path.exists(temp_path):
os.remove(temp_path)
# End the program
os._exit(0)
# Check for exit commands if any(exit_cmd in utterance.lower() for exit_cmd in ["exit", "quit", "goodbye", "bye"]): print("Exit command detected. Shutting down...") farewell_text = "Thank you for using the vehicle support assistant. Goodbye!" with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file: temp_path = temp_file.name farewell_audio = audio_processor.generate_audio(farewell_text, DEEPGRAM_HEADERS) if farewell_audio: with open(temp_path, "wb") as f: f.write(farewell_audio) # Mute microphone and play farewell mic_muted.set() microphone.mute() play_audio(temp_path) time.sleep(0.2) # Clean up and exit if os.path.exists(temp_path): os.remove(temp_path) # End the program os._exit(0)
# Check for exit commands
if any(exit_cmd in utterance.lower() for exit_cmd in ["exit", "quit", "goodbye", "bye"]):
print("Exit command detected. Shutting down...")
farewell_text = "Thank you for using the vehicle support assistant. Goodbye!"
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file:
temp_path = temp_file.name
farewell_audio = audio_processor.generate_audio(farewell_text, DEEPGRAM_HEADERS)
if farewell_audio:
with open(temp_path, "wb") as f:
f.write(farewell_audio)
# Mute microphone and play farewell
mic_muted.set()
microphone.mute()
play_audio(temp_path)
time.sleep(0.2)
# Clean up and exit
if os.path.exists(temp_path):
os.remove(temp_path)
# End the program
os._exit(0)

第 9 步:错误处理和稳定性

为确保无缝和弹性的用户体验,语音代理必须从容应对错误。如果处理不当,网络故障、音频响应缺失或用户输入无效等意外问题可能会中断交互。

异常处理

整个代码都使用 Try-except 块来捕获和优雅地处理错误:

  • 在音频生成和播放功能中。
  • 在与 OpenAI 和 Deepgram 的 API 交互过程中。
  • 在主事件处理循环中。
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
try:
# Generate audio for each segment
with open(temp_path, "wb") as output_file:
for segment_text in text_segments:
audio_data = audio_processor.generate_audio(segment_text, DEEPGRAM_HEADERS)
if audio_data:
output_file.write(audio_data)
except Exception as e:
print(f"Error generating or playing audio: {e}")
try: # Generate audio for each segment with open(temp_path, "wb") as output_file: for segment_text in text_segments: audio_data = audio_processor.generate_audio(segment_text, DEEPGRAM_HEADERS) if audio_data: output_file.write(audio_data) except Exception as e: print(f"Error generating or playing audio: {e}")
try:
# Generate audio for each segment
with open(temp_path, "wb") as output_file:
for segment_text in text_segments:
audio_data = audio_processor.generate_audio(segment_text, DEEPGRAM_HEADERS)
if audio_data:
output_file.write(audio_data)
except Exception as e:
print(f"Error generating or playing audio: {e}")

资源清理

适当的资源管理对可靠的应用程序至关重要:

  • 使用后删除临时文件。
  • 正确释放 Pygame 音频资源。
  • 退出时关闭麦克风和连接对象。
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
# Clean up microphone.finish() dg_connection.finish() # Clean up welcome file if os.path.exists(welcome_file): os.remove(welcome_file)
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)

第 10 步:运行语音助理的最后步骤

我们需要一个主函数将所有功能串联起来,确保语音助手顺利运行。主功能将

  • 聆听用户的语音。
  • 将语音转换为文本,并使用人工智能生成回复、
  • 将回复转换成语音,然后将语音播放给用户。
  • 这一过程可确保助手与用户进行完整、无缝的互动。
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def main():
"""Main function to run the voice assistant."""
print("Starting Vehicle Support Voice Assistant 'Eric'...")
print("Speak after the welcome message.")
print("\nPress Enter to stop the assistant...\n")
# Generate and play welcome message
welcome_file = generate_welcome_message()
time.sleep(0.5) # Give time for welcome message to start
try:
# Initialize is_finalslist to store transcription segments
is_finals = []
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Register event handlers
# [event registration code here]
# Configure and start Deepgram connection
if not dg_connection.start(options, addons=addons):
print("Failed to connect to Deepgram")
return
# Start microphone
microphone = Microphone(dg_connection.send)
microphone.start()
# Wait for user to press Enter to stop
input("")
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
print("Assistant stopped.")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
def main(): """Main function to run the voice assistant.""" print("Starting Vehicle Support Voice Assistant 'Eric'...") print("Speak after the welcome message.") print("\nPress Enter to stop the assistant...\n") # Generate and play welcome message welcome_file = generate_welcome_message() time.sleep(0.5) # Give time for welcome message to start try: # Initialize is_finalslist to store transcription segments is_finals = [] # Set up Deepgram connection dg_connection = deepgram_client.listen.websocket.v("1") # Register event handlers # [event registration code here] # Configure and start Deepgram connection if not dg_connection.start(options, addons=addons): print("Failed to connect to Deepgram") return # Start microphone microphone = Microphone(dg_connection.send) microphone.start() # Wait for user to press Enter to stop input("") # Clean up microphone.finish() dg_connection.finish() # Clean up welcome file if os.path.exists(welcome_file): os.remove(welcome_file) print("Assistant stopped.") except Exception as e: print(f"Error: {e}") if __name__ == "__main__": main()
def main():
"""Main function to run the voice assistant."""
print("Starting Vehicle Support Voice Assistant 'Eric'...")
print("Speak after the welcome message.")
print("\nPress Enter to stop the assistant...\n")
# Generate and play welcome message
welcome_file = generate_welcome_message()
time.sleep(0.5)  # Give time for welcome message to start
try:
# Initialize is_finalslist to store transcription segments
is_finals = []
# Set up Deepgram connection
dg_connection = deepgram_client.listen.websocket.v("1")
# Register event handlers
# [event registration code here]
# Configure and start Deepgram connection
if not dg_connection.start(options, addons=addons):
print("Failed to connect to Deepgram")
return
# Start microphone
microphone = Microphone(dg_connection.send)
microphone.start()
# Wait for user to press Enter to stop
input("")
# Clean up
microphone.finish()
dg_connection.finish()
# Clean up welcome file
if os.path.exists(welcome_file):
os.remove(welcome_file)
print("Assistant stopped.")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()

有关代码的完整版本,请参阅此处。

注:由于我们目前使用的是 Deepgram 的免费版本,受免费计划的限制,代理的响应时间往往较慢。

语音代理使用案例

1. 客户支持自动化

实例:

  • 银行与金融:回答有关账户余额、交易或信用卡账单的询问。
  • 电子商务:提供订单状态、退货政策或产品建议。
  • 航空与旅行:协助处理航班预订、取消和行李政策。

对话示例:

  • 用户:我的订单什么时候发货?
  • 代理:您的订单已于 2 月 17 日发货,预计将于 2 月 20 日到达。

2. 医疗保健虚拟助理

示例:

  • 医院和诊所:预约医生。
  • 家庭护理:提醒老年患者服药。
  • 远程医疗:在连接医生之前提供基本症状分析。

对话示例

  • 用户:我头痛发烧。我该怎么办?
  • 代理:根据您的症状,您可能有轻微发烧。请补充水分并注意休息。如果症状持续,请咨询医生。

3. 车辆语音助手

例如:导航

  • 导航:查找最近的加油站。
  • 音乐控制:播放我的公路旅行播放列表。
  • 紧急帮助:呼叫道路救援。

对话示例

  • 用户:我的路线交通如何?
  • 代理: 交通状况一般。预计到达时间为 45 分钟。

小结

语音代理使交互变得自然、高效和易于使用,从而彻底改变了沟通方式。语音代理在客户支持、智能家居、医疗保健和金融等行业有着多种多样的应用案例。

通过利用语音到文本、文本到语音和 NLP,它们可以理解上下文,提供智能响应,并无缝处理复杂任务。随着人工智能的发展,这些系统将变得更加个性化和人性化,它们从互动中学习的能力将使它们能够提供越来越多的定制和直观体验,使它们成为个人和职业环境中不可或缺的伙伴。

常见问题

Q1. 什么是语音代理?

A. 语音代理是一种人工智能驱动的系统,可以处理语音、理解上下文,并利用语音到文本、NLP 和文本到语音技术做出智能响应。

Q2. 语音代理有哪些关键组件?

A. 主要包括:- 语音到文本(STT): 自然语言处理(NLP): 自然语言处理(NLP):理解并处理输入: 文本到语音(TTS):将文本回复转换为类似人类的语音: 人工智能模型:生成有意义且能感知上下文的回复。

Q3. 语音代理用于哪些领域?

答:语音代理广泛应用于客户服务、医疗保健、虚拟助手、智能家居、银行业务、汽车支持和无障碍解决方案。

Q4. 语音代理能听懂不同的语言和口音吗?

A. 可以,许多先进的语音代理都支持多种语言和口音,可在全球范围内改善无障碍环境和用户体验。

Q5. 语音代理是否会取代人工支持代理?

A. 不是,设计语音代理的目的是通过处理重复性任务来协助和加强人工座席,使人工座席能够集中精力处理复杂问题。

评论留言