逍遥之韶

欲穷千里目，更上一层楼。

667 字

3 分钟

OpenCV案例：手写数字识别

2023-09-15

计算机视觉

OpenCV

/

数字图像处理

/

目标识别

1. 介绍#

MNIST数据集是由0~9手写数字图片和数字标签组成，包含60000个训练样本和10000个测试样本，每个样本都是一张28 * 28像素的灰度手写数字图片。本文采用 K近邻方式 对MNIST数据集进行训练与识别。

MNIST数据

2. 数据读取#

采用struct模块，读取MNIST图像数据。
读取完成后，得到4个变量：
- train_images：train-images.idx3-ubyte，unit8类型，(60000, 768)，每一行对应一个图像数据
- train_labels：train-labels.idx1-ubyte, unit8类型，(60000, )，每一行对应一个图像数据的标签
- test_images：t10k-images.idx3-ubyte, unit8类型，(10000, 768)，每一行对应一个图像数据
- test_labels：t10k-labels.idx1-ubyte, unit8类型，(10000, )，每一行对应一个图像数据的标签

1
import os
2
import time
3
import struct
4
import numpy as np
5
import matplotlib.pyplot as plt
6
import cv2
7

8
def load_mnist(path, kind='train'):
9
    # train: train, test: t10k
10
    labels_path = os.path.join(path,'%s-labels.idx1-ubyte'% kind)
11
    images_path = os.path.join(path,'%s-images.idx3-ubyte'% kind)
12
    with open(labels_path, 'rb') as lbpath:
13
        magic, n = struct.unpack('>II',lbpath.read(8))
14
        labels = np.fromfile(lbpath,dtype=np.uint8)
15
    with open(images_path, 'rb') as imgpath:
16
        magic, num, rows, cols = struct.unpack('>IIII',imgpath.read(16))
17
        images = np.fromfile(imgpath,dtype=np.uint8).reshape(len(labels), 784)
18
    return images, labels
19

20
# 读取MNIST数据
21
train_images, train_labels=load_mnist(".", "train")
22
test_images, test_labels=load_mnist(".", "t10k")
23

24
# 显示MNIST数据
25
fig=plt.figure(figsize=(8,8))
26
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
27
for i in range(25):
28
    images = np.reshape(test_images[i], [28,28])
29
    ax = fig.add_subplot(5, 5, i+1, xticks=[], yticks=[])
30
    ax.imshow(images,cmap=plt.cm.binary, interpolation='nearest')
31
    ax.text(0,7,str(test_labels[i]))
32
plt.show()

3. 数据变形#

将trains_labels和test_labels转换为二维数组
将train_images、train_labels、test_images、test_labels四组数据转换为浮点数，以用于KNN训练

1
# 数据变形
2
train_labels = train_labels.reshape(len(train_labels), 1)
3
test_labels = test_labels.reshape(len(test_labels), 1)
4

5
# 类型转换
6
train_images = train_images.astype(np.float32)
7
train_labels = train_labels.astype(np.float32)
8
test_images = test_images.astype(np.float32)
9
test_labels = test_labels.astype(np.float32)

4. `KNN训练`#

1
# 训练
2
knn = cv2.ml.KNearest_create()
3
start = time.time()
4
knn.train(train_images, cv2.ml.ROW_SAMPLE, train_labels)
5
end = time.time()
6
print("训练时长：%d ms" % ((end - start) * 1000))

1
训练时长：29 ms

5. `KNN推理`#

平均推理每张图耗时1.08毫秒左右

1
# 推理：使用 K 近邻算法分类
2
K = 11
3
start = time.time()
4
ret, inferences, neighbours, dist = knn.findNearest(test_images, K)
5
end = time.time()
6
print("推理时长：%d ms" % ((end - start) * 1000))

1
推理时长：10861 ms

6. 性能统计#

统计每个字符的准确率和整体的准确率

1
# 结果统计
2
cnt = np.zeros(shape=(10,2), dtype="uint32")
3
for idx in range(0, 10000):
4
    if np.abs(inferences[idx][0] - test_labels[idx][0]) < 0.1:
5
        digit = int(test_labels[idx][0])
6
        cnt[digit][0] = cnt[digit][0] + 1
7
        cnt[digit][1] = cnt[digit][1] + 1
8
    else:
9
        digit = int(test_labels[idx][0])
10
        cnt[digit][1] = cnt[digit][1] + 1
11

12
for n in range(0, 10):
13
    print("字符%d的准确率为：%3.2f %%" % (n, 100 * cnt[n][0] / cnt[n][1]))
14

15
total = np.sum(cnt, axis=0)
16
print("整体的准确率为： %3.2f %%" % (100 * total[0] / total[1]))

1
字符0的准确率为：99.18 %
2
字符1的准确率为：99.74 %
3
字符2的准确率为：94.96 %
4
字符3的准确率为：96.44 %
5
字符4的准确率为：95.52 %
6
字符5的准确率为：97.09 %
7
字符6的准确率为：98.43 %
8
字符7的准确率为：95.82 %
9
字符8的准确率为：93.53 %
10
字符9的准确率为：95.84 %
11
整体的准确率为： 96.68 %

OpenCV案例：人脸识别与人脸检测

OpenCV案例：滤镜

有朋自远方来，不亦乐乎？