抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

一方の笔记本

The only source of knowledge is experience.

关键词检测(KWS)任务是人机藉由语音交互的关键技术,该方向入门难度较大且参考资料较少。现总结一篇KWS入门的博客,总结一下学习成果。

开发环境配置

Anaconda

安装过程在此略去,值得注意的是环境变量的添加与配置。常用的命令如下所示。

1
2
3
4
5
# create a conda environment named "env-name" with python 3.7
conda create -n env-name python=3.7

# remove the environment named "env-name"
conda remove -n env-name --all

VS Code

VS Code 很适合作为开发环境,轻量且自定义程度高。

想让命令行启动时就激活指定的 Anaconda 环境,结果总是多启动一次环境,导致新建终端速度太慢。由此得到的经验是查看默认配置有哪些选项是好文明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"terminal.integrated.defaultProfile.windows": "Command Prompt",
"terminal.integrated.profiles.windows": {
"PowerShell": {
"source": "PowerShell",
"icon": "terminal-powershell"
},
"Command Prompt": {
"path": [
"${env:windir}\\System32\\cmd.exe",
"${env:windir}\\Sysnative\\cmd.exe"
],
"args": ["/k", "conda activate py37"],
"icon": "terminal-cmd"
},
"Git Bash": {
"source": "Git Bash"
}
},
"python.condaPath": "D:\\Anaconda3\\Scripts\\conda.exe",
"editor.renderLineHighlight": "all",
"python.terminal.activateEnvironment": false, // set this configuration to false
}

PyTorch 框架极速入门

核心要义

  • 定义任何类都要继承 torch.nn.Module
  • 自定义的类一般至少应实现前向传播函数 forward() 和构造函数;
  • 注意 optimizer.zero_grad() 的使用。

示例代码

梯度下降法

以下代码是基于 NumPy 实现的梯度下降法使用四次函数拟合余弦函数的实例。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np
import matplotlib.pyplot as plt
import math

f = np.cos

x = np.linspace(-math.pi / 4, math.pi / 4, 2000)
y = f(x)

a = 1
b = -10
c = 20

learning_rate = 1e-3

for _ in range(4000):
y_pred = a + b * x ** 2 + c * x ** 4
loss = np.square(y_pred - y).sum()
if _ % 100 == 99:
print(f'Epoch {_}, MSE Loss: {loss}.')
grad_y_pred = 2 * (y_pred - y)
grad_b = (grad_y_pred * x ** 2).sum()
grad_c = (grad_y_pred * x ** 4).sum()

b -= grad_b * learning_rate
c -= grad_c * learning_rate

y_pred = a + b * x ** 2 + c * x ** 4
plt.plot(x, y, 'g')
plt.plot(x, y_pred, 'r')
plt.legend(labels=("y = cos x", f"1+{b}x^2+{c}x^4"))
plt.show()

使用简单的神经网络在 MINST 数据集上进行训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader

batch_size = 64
train_loss = []

# two dataloader instance
train_dataloader = DataLoader(
torchvision.datasets.MNIST(
'./data/',
train=True,
download=False,
transform=torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.1307,),
(0.3081,)
)
])
),
batch_size=batch_size,
shuffle=True
)
test_dataloader = DataLoader(
torchvision.datasets.MNIST(
'./data/',
train=False,
download=False,
transform=torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.1307,),
(0.3081,)
)
])
),
batch_size=batch_size,
shuffle=True
)

device = "cuda" if torch.cuda.is_available() else "cpu"

# remember to implement class torch.nn.Module
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)

def forward(self, x):
x = self.flatten(x)
return self.linear_relu_stack(x)

model = NeuralNetwork().to(device)
print(model)

def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
'''
here model is called, see module.py(line 1124)
for more detail about implementation of call
'''
pred = model(X)
loss = loss_fn(pred, y)

optimizer.zero_grad()
loss.backward()
optimizer.step()

if batch % 100 == 0:
# .item() for those tensors with only one component
loss, current = loss.item(), batch * len(X)
train_loss.append(loss)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)

plt.plot([100 * (i + 1) for i in range(len(train_loss))], train_loss, 'r')
plt.title('Train Loss')
plt.show()

研究方向基本知识

数据处理

系统基本框架

经典论文

small-footprint keyword spotting using deep neural network

常用工具

  • espnet
  • wekws
  • pytorch

参考资料

评论