0%

基于改进UNet算法的农业图像分割

主要目标

基于改进的U-Net算法实现农业遥感图像语义分割训练

实验过程

基本原理

图像分割是将图片根据内容分割成不同的块,它需要对每个像素点进行分类,物体的轮廓是精准勾勒的,而不是像检测那样给出边界框。按照分割目的来进行划分,图像分割可以分为以下三种:

普通分割:不关注物体是什么,只是划定不同物体之间的界限,将分属不同物体的像素区域划分开来,比如将前景和背景分开。

语义分割:在普通分割的基础上,分类出每一块区域的标签,识别出该区域的物体。

实例分割:在语义分割的基础上,将每个物体都分别开来,同类的物体也有不同的编号,比如识别场景中的不同的人。

本实验目标:我们本次实验要做的就是对农业大数据图像进行语义分割,将不同区域的农作物进行分类。无论是哪种图像分割,整体流程都是差不多的,下面介绍一下图像分割的基本流程和训练过程。

图像分割的基本流程

下采样与上采样

Convolution(卷积):通过卷积层提取图像的多层次特征。
Deconvolution(反卷积)或Resize(重采样):通过反卷积或重采样恢复图像的空间分辨率。

多尺度特征融合

特征逐点相加:不同层的特征图逐点相加,融合多层特征。
特征channel维度拼接:不同层的特征图在channel维度拼接,保留多层特征信息。

获得像素级别的segmentation map

对每一个像素点进行类别判断,生成每个像素点的分类标签。

训练过程

输入:image + label(输入图像和标签)

前向计算:out = model(image)(通过模型进行前向计算,输出预测结果)

计算损失:loss = loss_func(out, label)(通过损失函数计算预测结果与标签之间的差异)

反向传播:loss.backward()(计算梯度,进行反向传播)

更新权重:optimizer.minimize(loss)(使用优化器更新模型权重)

导入组件包

在实现基于改进的U-Net算法进行农业遥感图像语义分割的过程中,需要导入一系列用于数据处理、可视化和模型训练的工具包。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np
from keras.models import Sequential
from keras.layers import *
from keras.utils.np_utils import to_categorical
from keras.preprocessing.image import img_to_array
from keras.callbacks import ModelCheckpoint
from sklearn.preprocessing import LabelEncoder
from keras.models import Model
from keras.models import load_model
from keras.layers.merge import concatenate
from PIL import Image
import matplotlib.pyplot as plt
import cv2
import random
import os
from tqdm import tqdm
os.environ["CUDA_VISIBLE_DEVICES"] = "4"
%matplotlib inline

准备数据

大尺度农作物卫星遥感数据

数据来源于无人机遥感影像。无人机遥感测量技术作为空间信息技术的重要组成部分,既能作为星载遥感影像的重要补充,又能有效替代人工实地调查。凭借着降低地面人工调查强度和调查成本、快速获取实时高分辨数据的优势,它成为农业统计调查工作中的一大创新点,同时也是精准农业的重要方向之一。

数据来源地址

天池大赛 - 农作物遥感影像数据集

数据可视化

标注作物类型图片是一张unint8单通道图像。每个像素点值表示原始图片中对应位置所属类别,其中“烤烟”像素值 1,“玉米”像素值 2,“薏仁米”像素值 3,“人造建筑”像素值 4,其余所有位置视为“其他”像素值 0。

随机选择训练集中的18张图像,并且利用matplotlib库对其进行可视化,展示数据集的基本情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
directory = "/root/data/Lab4/Lab4-data/UNetP_data/train_data"
images = random.choices(os.listdir(directory), k=18)

fig = plt.figure(figsize=(20, 10))
columns = 6
rows = 3

for x, i in enumerate(images):
path = os.path.join(directory, i)
img = plt.imread(path)
fig.add_subplot(rows, columns, x + 1)
plt.imshow(img)

plt.show()

alt text

设置数据和标签

这里采用labelEncoder方法进行标签的处理,标准化标签,将标签值统一转换成range范围内。

1
2
3
4
5
width = 512  
height = 512
classes = [0. , 1., 2., 3. , 4.]
labelencoder = LabelEncoder()
labelencoder.fit(classes)

图片加载函数

对指定路径的图片进行加载,如果是灰度图,直接读取,如果是RGB图像,则对图像进行归一化。

1
2
3
4
5
6
def load_img(path, grayscale=False):
if grayscale:
img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
else:
img = cv2.imread(path)
img = np.array(img,dtype="float") / 255.0

定义训练参数和数据

设置随机数种子

1
2
seed = 7  
np.random.seed(seed)

设置验证集获取规则,将训练集的1/4用作验证集。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def get_train_val(val_rate=0.25):
train_url = []
train_set = []
val_set = []
for pic in os.listdir('/root/data/Lab4/Lab4-data/UNetP_data/train_data/'):
train_url.append(pic)
random.shuffle(train_url)
total_num = len(train_url)
val_num = int(val_rate * total_num)
for i in range(len(train_url)):
if i < val_num:
val_set.append(train_url[i])
else:
train_set.append(train_url[i])
return train_set, val_set

设置训练集生成函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def generateData(batch_size, data=[]):
while True:
train_data = []
train_label = []
batch = 0
for i in (range(len(data))):
url = data[i]
batch += 1
img = load_img('/root/data/Lab4/Lab4-data/UNetP_data/train_data/' + url)
img = img_to_array(img)
train_data.append(img)
label = load_img('/root/data/Lab4/Lab4-data/UNetP_data/train_label/' + url, grayscale=True)
label = img_to_array(label)
train_label.append(label)
if batch % batch_size == 0:
train_data = np.array(train_data)
train_label = np.array(train_label)
yield (train_data, train_label)
train_data = []
train_label = []
batch = 0

设置验证集生成函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def generateValidData(batch_size, data=[]):
while True:
valid_data = []
valid_label = []
batch = 0
for i in (range(len(data))):
url = data[i]
batch += 1
img = load_img('/root/data/Lab4/Lab4-data/UNetP_data/train_data/' + url)
img = img_to_array(img)
valid_data.append(img)
label = load_img('/root/data/Lab4/Lab4-data/UNetP_data/train_label/' + url, grayscale=True)
label = img_to_array(label)
valid_label.append(label)
if batch % batch_size == 0:
valid_data = np.array(valid_data)
valid_label = np.array(valid_label)
yield (valid_data, valid_label)
valid_data = []
valid_label = []
batch = 0

模型搭建训练和改进

UNet网络的介绍

Unet网络结构如其名呈现一个U字形,即由卷积和池化单元构成。左半边为编码器,即如传统的分类网络是“下采样阶段”;右半边为解码器,是“上采样阶段”。中间的灰色箭头为跳跃连接,将浅层的特征与深层的特征拼接。因为浅层通常可以抓取图像的一些简单的特征,比如边界、颜色,而深层经过的卷积操作多抓取到图像的一些抽象特征,将浅深同时利用起来会起到比较好的效果。Unet的效果图如下:


alt text

训练函数的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def train(mselect): 
EPOCHS = 10
BS = 16
if(mselect=='UNet'):
model = unet()
else:
model = unet_new()
if(mselect=='UNet'):
modelcheck = ModelCheckpoint('model.h5',monitor='val_acc',save_best_only=True,mode='max')
else:
modelcheck = ModelCheckpoint('/root/data/Lab4/Lab4-data/models/model_new_{epoch:02d}_{val_accuracy:.2f}.h5',monitor='val_acc',save_best_only=False,mode='max',save_freq='epoch')
callable = [modelcheck]
train_set,val_set = get_train_val()
train_numb = len(train_set)
valid_numb = len(val_set)
print ("the number of train data is",train_numb)
print ("the number of val data is",valid_numb)
H = model.fit_generator(generator=generateData(BS,train_set),steps_per_epoch=train_numb//BS,epochs=EPOCHS,verbose=1,
validation_data=generateValidData(BS,val_set),validation_steps=valid_numb//BS,callbacks=callable,max_queue_size=1)

# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
N = EPOCHS
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
if(mselect=='UNet'):
plt.title("Training Loss and Accuracy on U-Net Satellite Seg")
else:
plt.title("Training Loss and Accuracy on U-Net_new Satellite Seg")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig('/root/data/Lab4/Lab4-data/plot.png')
plt.show()

unet网络的定义

池化层采用最大池化,激活函数采用relu,上采样的线性插值方式采取keras的UpSampling2D方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def unet():
inputs = Input((3, width, height))

conv1 = Conv2D(32, (3, 3), activation="relu", padding="same")(inputs)
conv1 = Conv2D(32, (3, 3), activation="relu", padding="same")(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(64, (3, 3), activation="relu", padding="same")(pool1)
conv2 = Conv2D(64, (3, 3), activation="relu", padding="same")(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(128, (3, 3), activation="relu", padding="same")(pool2)
conv3 = Conv2D(128, (3, 3), activation="relu", padding="same")(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(256, (3, 3), activation="relu", padding="same")(pool3)
conv4 = Conv2D(256, (3, 3), activation="relu", padding="same")(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(512, (3, 3), activation="relu", padding="same")(pool4)
conv5 = Conv2D(512, (3, 3), activation="relu", padding="same")(conv5)

up6 = concatenate([UpSampling2D(size=(2, 2))(conv5), conv4], axis=1)
conv6 = Conv2D(256, (3, 3), activation="relu", padding="same")(up6)
conv6 = Conv2D(256, (3, 3), activation="relu", padding="same")(conv6)

up7 = concatenate([UpSampling2D(size=(2, 2))(conv6), conv3], axis=1)
conv7 = Conv2D(128, (3, 3), activation="relu", padding="same")(up7)
conv7 = Conv2D(128, (3, 3), activation="relu", padding="same")(conv7)

up8 = concatenate([UpSampling2D(size=(2, 2))(conv7), conv2], axis=1)
conv8 = Conv2D(64, (3, 3), activation="relu", padding="same")(up8)
conv8 = Conv2D(64, (3, 3), activation="relu", padding="same")(conv8)

up9 = concatenate([UpSampling2D(size=(2, 2))(conv8), conv1], axis=1)
conv9 = Conv2D(32, (3, 3), activation="relu", padding="same")(up9)
conv9 = Conv2D(32, (3, 3), activation="relu", padding="same")(conv9)

conv10 = Conv2D(1, (1, 1), activation="sigmoid")(conv9)
#conv10 = Conv2D(n_label, (1, 1), activation="softmax")(conv9)

model = Model(inputs=inputs, outputs=conv10)
model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
return model

训练unet网络

alt text
alt text

模型改进

我们对原来的unet网络增加了下采样层,并且设定了权重的随机初始化,池化方式更改为平均池化,并且在网络中加入了批规范化BatchNormalization的处理,基于梯度的训练过程可以更加有效的进行,即加快收敛速度,减轻梯度消失或爆炸导致的无法训练的问题,并且增加了Dropout的机制,有效的防止过拟合。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def unet_new():

inputs = Input((width, height, 3))#改进
conv1 = Conv2D(8, 3, activation='relu', padding='same', kernel_initializer='he_normal')(inputs)
pool1 = AveragePooling2D(pool_size=(2, 2))(conv1) # 16

conv2 = BatchNormalization(momentum=0.99)(pool1)
conv2 = Conv2D(64, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv2)
conv2 = BatchNormalization(momentum=0.99)(conv2)
conv2 = Conv2D(64, 1, activation='relu', padding='same', kernel_initializer='he_normal')(conv2)
conv2 = Dropout(0.02)(conv2)
pool2 = AveragePooling2D(pool_size=(2, 2))(conv2) # 8

conv3 = BatchNormalization(momentum=0.99)(pool2)
conv3 = Conv2D(128, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv3)
conv3 = BatchNormalization(momentum=0.99)(conv3)
conv3 = Conv2D(128, 1, activation='relu', padding='same', kernel_initializer='he_normal')(conv3)
conv3 = Dropout(0.02)(conv3)
pool3 = AveragePooling2D(pool_size=(2, 2))(conv3) # 4

conv4 = BatchNormalization(momentum=0.99)(pool3)
conv4 = Conv2D(256, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv4)
conv4 = BatchNormalization(momentum=0.99)(conv4)
conv4 = Conv2D(256, 1, activation='relu', padding='same', kernel_initializer='he_normal')(conv4)
conv4 = Dropout(0.02)(conv4)
pool4 = AveragePooling2D(pool_size=(2, 2))(conv4)

conv5 = BatchNormalization(momentum=0.99)(pool4)
conv5 = Conv2D(512, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv5)
conv5 = BatchNormalization(momentum=0.99)(conv5)
conv5 = Conv2D(512, 1, activation='relu', padding='same', kernel_initializer='he_normal')(conv5)
conv5 = Dropout(0.02)(conv5)
pool4 = AveragePooling2D(pool_size=(2, 2))(conv4)
# conv5 = Conv2D(35, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv4)
# drop4 = Dropout(0.02)(conv5)
pool4 = AveragePooling2D(pool_size=(2, 2))(pool3) # 2
pool5 = AveragePooling2D(pool_size=(2, 2))(pool4) # 1

conv6 = BatchNormalization(momentum=0.99)(pool5)
conv6 = Conv2D(256, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv6)

conv7 = Conv2D(256, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv6)
up7 = (UpSampling2D(size=(2, 2))(conv7)) # 2
conv7 = Conv2D(256, 3, activation='relu', padding='same', kernel_initializer='he_normal')(up7)
merge7 = concatenate([pool4, conv7], axis=3)

conv8 = Conv2D(128, 3, activation='relu', padding='same', kernel_initializer='he_normal')(merge7)
up8 = (UpSampling2D(size=(2, 2))(conv8)) # 4
conv8 = Conv2D(128, 3, activation='relu', padding='same', kernel_initializer='he_normal')(up8)
merge8 = concatenate([pool3, conv8], axis=3)

conv9 = Conv2D(64, 3, activation='relu', padding='same', kernel_initializer='he_normal')(merge8)
up9 = (UpSampling2D(size=(2, 2))(conv9)) # 8
conv9 = Conv2D(64, 3, activation='relu', padding='same', kernel_initializer='he_normal')(up9)
merge9 = concatenate([pool2, conv9], axis=3)

conv10 = Conv2D(32, 3, activation='relu', padding='same', kernel_initializer='he_normal')(merge9)
up10 = (UpSampling2D(size=(2, 2))(conv10)) # 16
conv10 = Conv2D(32, 3, activation='relu', padding='same', kernel_initializer='he_normal')(up10)

conv11 = Conv2D(16, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv10)
up11 = (UpSampling2D(size=(2, 2))(conv11)) # 32
conv11 = Conv2D(8, 3, activation='relu', padding='same', kernel_initializer='he_normal')(up11)

# conv12 = Conv2D(3, 1, activation='relu', padding='same', kernel_initializer='he_normal')(conv11)
conv12 = Conv2D(1, 1, activation='relu', padding='same', kernel_initializer='he_normal')(conv11)

model = Model(input=inputs, output=conv12)
#print(model.summary())
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])

return model

模型结构

alt text
alt text

训练改进后的模型

可以看到,经过改进后,训练函数的收敛迅速,并且准确率也得到了明显的提高

alt text
alt text
alt text

模型使用

引入组件包

1
2
3
4
5
6
7
8
import cv2
import random
import numpy as np
import os
from keras.preprocessing.image import img_to_array
from keras.models import load_model
from sklearn.preprocessing import LabelEncoder
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

设置测试数据的格式和标签

1
2
3
4
5
image_size = 512
classes = [0. , 1., 2., 3. , 4.]
TEST_SET = ['/root/data/Lab4/Lab4-data/test.png']
labelencoder = LabelEncoder()
labelencoder.fit(classes)

定义预测函数,加载预测模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def predict():
# load the trained convolutional neural network
print("[INFO] loading network...")
model = load_model('/root/data/Lab4/Lab4-data/model.h5')
stride = 512
for n in range(len(TEST_SET)):
path = TEST_SET[n]
#load the image
image = cv2.imread(path)

h,w,_ = image.shape

padding_h = (h//stride + 1) * stride
padding_w = (w//stride + 1) * stride
padding_img = np.zeros((padding_h,padding_w,3),dtype=np.uint8)
padding_img[0:h,0:w,:] = image[:,:,:]
#padding_img = padding_img.astype("float") / 255.0
padding_img = img_to_array(padding_img)

print ('src:',padding_img.shape)
mask_whole = np.zeros((padding_h,padding_w),dtype=np.uint8)
for i in range(padding_h//stride):
for j in range(padding_w//stride):
crop = padding_img[i*stride:i*stride+image_size,j*stride:j*stride+image_size,:3]#trick

#_,ch,cw = crop.shape
# if ch != 512 or cw != 512:
# print ('invalid size!')
# continue

crop = np.expand_dims(crop, axis=0)

#crop.transpose(0,3,2,1)

# crop.reshape(1,3,512,512)
#print(crop.shape)
pred = model.predict(crop.transpose(0,3,1,2),verbose=2)
#print (np.unique(pred))
pred = pred.reshape((512,512)).astype(np.uint8)
#print 'pred:',pred.shape
mask_whole[i*stride:i*stride+image_size,j*stride:j*stride+image_size] = pred[:,:]


cv2.imwrite('pre'+str(n+1)+'.png',mask_whole[0:h,0:w])

生成预测结果

1
2
3
4
from keras import backend as K

# K.set_image_dim_ordering('th')#trick
predict()

图像分割结果可视化

加载原始图像

1
2
3
img_org = cv2.imread('/root/data/Lab4/Lab4-data/test.png', cv2.IMREAD_GRAYSCALE)
plt.imshow(img_org)
plt.savefig('img_org.png')

alt text
预测结果可视化

1
2
3
4
5
6
# 加载模型预测结果
img_label = cv2.imread('/root/data/Lab4/Lab4-data/pre1.png', cv2.IMREAD_GRAYSCALE)
# 像素放大
img_label*=30
plt.imshow(img_label)
plt.savefig('img_label.png')

alt text

合并模型结果与原始图像,可以看到模型比较成功地将植被与背景环境分割开来,基本达到了预期效果。

1
2
3
img_show = cv2.addWeighted(img_org, 0.1, img_label, 0.8, 0,dtype = cv2.CV_32F) 
plt.imshow(img_show)
plt.savefig('weight.png')

alt text

实验总结

我们可以看到,在unet中除最后一个层激活函数为sigmoid,其余均为relu,而默认的初始化方法为Xavier初始化,显然不满足神经元激活均值为0的假设,而unet_new的激活函数均为relu,采用了He初始化

Unet采用的是2*2最大池化,对邻域内特征点取最大。

正向传播:取邻域内最大,并记住最大值的索引位置。

alt text

反向传播:将特征值填充到正向传播中值最大的索引位置,其他位置补0

alt text

unet_new采用的是2*2的平均池化,邻域内特征点只求平均。

正向传播:邻域内取平均。

alt text

反向传播:特征值根据邻域大小被平均,然后传给每个索引位置。

alt text

unet_new还采用了drop-out,对于一次迭代中的某一层神经网络,先随机选择中的一些神经元并将其临时隐藏(丢弃),然后再进行本次训练和优化。在下一次迭代中,继续随机隐藏一些神经元,如此直至训练结束。由于是随机丢弃,故而每一个mini-batch都在训练不同的网络,可以有效防止过拟合。

alt text

Xavier初始化

Xavier初始化,由Xavier Glorot 在2010年的论文 Understanding the difficulty of training deep feedforward neural networks 提出。为了避免梯度爆炸或者梯度消失,有两个经验性的准则:

  1. 每一层神经元激活值的均值要保持为 0
  2. 每一层激活的方差应该保持不变。

在正向传播时,每层的激活值的方差保持不变;在反向传播时,每层的梯度值的方差保持不变。
基于上述的准则,初始的权值参数 $W^l$ ( $l$ 为网络的第 $l$ 层) 要符合以下公式
$$
\begin{aligned}
W^{[l]} & \sim \mathcal{N}\left(\mu=0, \sigma^2=\frac{1}{n^{[l-1]}}\right) \
b^{[l]} & =0
\end{aligned}
$$

其中 $n^{n-1}$ 是第 $l-1$ 层的神经元的个数。也就是说,初始的权值 $w$ 可以从均值 $\mu=0$ ,方差为 $\sigma^2=\frac{1}{n^{l-1}}$ 的正态分布 $\mathrm{Q}$ 中随机选取。

He初始化 (MSRA)

由 Kaiming 在论文Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification提出,由于Xavier的假设条件是激活函数是关于0对称的,而常用的ReLU激活函数并不能满足该条件。

只考虑输入的个数,MSRA的初始化是一个均值为 0 ,方差为 $\sqrt{\frac{2}{F_{\text {in }}}}$ 的高斯分布
$$
w \sim G\left[0, \sqrt{\frac{2}{F_{i n}}}\right]
$$