基于 tf.keras 搭建简单神经网络

Tensorflow 从 2.0 版本开始，Keras 就被深度集成在了其中¹。同时，Tensorflow 2.0 相较于 1.x 版本，在很多方面都做出了改变。这里我尝试使用 tf.keras 针对 Regression 和 Classification 搭建简单的神经网络，记录一下过程，并进行简单的梳理。

Import

复制粘贴，一次引入，快捷高效，简单粗暴：

import tensorflow as tf
from tensorflow import keras

import matplotlib as mpl
import matplotlib.pyplot as plt

# 为了能在 Jupyter Notebook 中显示图片
%matplotlib inline

import numpy as np
import pandas as pd
import sklearn
import os
import sys
import time

# 打印各个版本信息
print(sys.version_info)
for module in tf, keras, mpl, np, pd, sklearn:
    print(module.__name__, module.__version__)

在这，也记录一下我当前的版本信息：

sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)tensorflow 2.0.0tensorflow_core.keras 2.2.4-tfmatplotlib 3.1.1numpy 1.18.1pandas 0.25.3sklearn 0.22.1

Regression

Dataset

回归问题的数据集，使用的是 sklearn 中的“加州房价”：

1
2
3

from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()

Data Split

按默认 3:1 的比例拆分数据（即 test_size=0.25），选取出训练集、验证集、测试集。

from sklearn.model_selection import train_test_split

x_train_all, x_test, y_train_all, y_test = train_test_split(
    housing.data, housing.target, random_state = 7)
x_train, x_valid, y_train, y_valid = train_test_split(
    x_train_all, y_train_all, random_state = 11)

print(x_train.shape, y_train.shape) # (11610, 8) (11610,)
print(x_valid.shape, y_valid.shape) # (3870, 8) (3870,)
print(x_test.shape, y_test.shape) # (5160, 8) (5160,)

Normalization

一言不合正则化：

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_valid_scaled = scaler.transform(x_valid)
x_test_scaled = scaler.transform(x_test)

这里我想简单说一说 fit_transform() 和 transform() 的区别，因为这个细节我之前也琢磨了一段时间，这里，还得提到另外一个函数，就是 fit()。官方的 API 文档中²，对这三个函数是这样解释的：

fit(self, X[, y])
Compute the mean and std to be used for later scaling.

transform(self, X[, copy])
Perform standardization by centering and scaling.

fit_transform(self, X[, y])
Fit to data, then transform it.

简单而言，fit_transform() 就是整合了 fit() 和 transform() 的功能。当我在训练集上用 fit_transform() 做 scale 的时候，它会先计算均值和方差，并记录下来，这就是 fit() 的功能。然后，再对数据执行归一化，也就是 transform()。

而验证集和测试集，都是用的训练集的均值和方差。也就是说，在训练集有过一次拟合后，验证集和测试集就可以直接根据训练集的均值和方差，调用 transform() 执行归一化。

Model

在网络搭建的过程中，相比 1.0 版本，Tensorflow 2.0 的确是引起舒♂适！

# 搭建网络
model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=x_train.shape[1:]),
    keras.layers.Dense(1),
])

# 编译模型
model.compile(loss="mean_squared_error", 
    optimizer = keras.optimizers.SGD(0.001))
    
callbacks = [keras.callbacks.EarlyStopping(
    patience=5, min_delta=1e-2)]

通过 model.summary() 可以看到网络层次，就两个 dense layer，一个输入，一个输出：

Model: "sequential_1"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================dense_1 (Dense)              (None, 30)                270       _________________________________________________________________dense_2 (Dense)              (None, 1)                 31        =================================================================Total params: 301Trainable params: 301Non-trainable params: 0

Train

使用 model.fit() 开启训练。with tf.Session() as sess: 什么的，甚至 tf.global_variables_initializer() 什么的，这里不需要了：

history = model.fit(x_train_scaled, y_train,
                    validation_data = (x_valid_scaled, y_valid),
                    epochs = 100,
                    callbacks = callbacks)

Plot

def plot_learning_curves(history):
    pd.DataFrame(history.history).plot(figsize=(8,5))
    plt.grid(True)
    plt.gca().set_ylim(0,1)
    plt.show()

plot_learning_curves(history)

Evaluate

1	model.evaluate(x_test_scaled, y_test, verbose=0)

看一下测试集的结果：

0.3782297415326732

Classification

Dataset & Data Split

分类问题的数据集使用的是 Keras 的 fashion_mnist：

# 使用 fashion_mnist 数据集
fashion_mnist = keras.datasets.fashion_mnist

#导入训练集和测试集
(x_train_all, y_train_all), (x_test, y_test) = fashion_mnist.load_data()

# 拆分训练集和验证集
x_valid, x_train = x_train_all[0:5000], x_train_all[5000:]
y_valid, y_train = y_train_all[0:5000], y_train_all[5000:]

print(x_valid.shape, y_valid.shape) # (5000, 28, 28) (5000,)
print(x_train.shape, y_train.shape) # (55000, 28, 28) (55000,)
print(x_test.shape, y_test.shape) # (10000, 28, 28) (10000,)

Normalization

由于 fit_transform() 要求输入的参数是一个二维矩阵，因此 x_train 的结构就要从 [None, 28, 28] 转换成 [None, 784]，接着再转回来，所以要 reshape() 两次：

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

x_train_scaled = scaler.fit_transform(
    x_train.astype(np.float32).reshape(-1, 1)).reshape(-1, 28, 28)
x_valid_scaled = scaler.transform(
    x_valid.astype(np.float32).reshape(-1, 1)).reshape(-1, 28, 28)
x_test_scaled = scaler.transform(
    x_test.astype(np.float32).reshape(-1, 1)).reshape(-1, 28, 28)

Model

再次引起舒♂适。借助 tf.keras 的 API，搭建网络添加 layers 的时候，真的十分方便。方法也不止一种。例如，上面已经提到过的，通过给 Sequential() 传递一个 list 的形式：

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
             optimizer = keras.optimizers.SGD(0.001),
             metrics = ["accuracy"])

这里的 loss 函数用的是 sparse_categorical_crossentropy，因为数据集中的 y，都是一个个标量，例如 y_train[0] 的值是 4。而要计算 loss，就需要是向量，因此需要经过 One-Hot，转成向量才可以。

另外，这里还可以通过调用 add() 函数来添加 layers：

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))

所以，这样一来，对于某些深度神经网络，就可以直接用 for 循环搭建，例如：

1 2	for _ in range(20): model.add(keras.layers.Dense(100, activation='selu'))

后面要是有别的 layer，直接跟上就可以了。

Train

1
2
3

history = model.fit(x_train_scaled, y_train,
                    epochs=10,
                    validation_data=(x_valid_scaled, y_valid))

Plot

def plot_learning_curves(history):
    pd.DataFrame(history.history).plot(figsize=(8, 5))
    plt.grid(True)
    plt.gca().set_ylim(0, 1)
    plt.show()
    
plot_learning_curves(history)

Evaluate

结果大概看一下：

1	model.evaluate(x_test_scaled, y_test, verbose=0)

[0.4324875540494919, 0.8463]

关于 Callback 输出路径

模型训练的过程中，有遇到过一个 callback 输出路径的问题，应该算是 Tensorflow 的一个 bug。比如，当我想保存模型文件的时候，我会先建立一个路径：

logdir = "./callbacks"

if not os.path.exists(logdir):
    os.mkdir(logdir)
output_model_file = os.path.join(logdir, "blahblah_model.h5")

callbacks = [
    keras.callbacks.ModelCheckpoint(output_model_file,
                                    save_best_only = True),
]

由于最近我使用的环境是 Windows，所以就遇到了这样的报错：

ProfilerNotRunningError: Cannot stop profiling. No profiler is running.

说白了，就是不能有 /。这个问题在 Github 的一个 Issue³ 上也有讨论。

因此在 Mac 以及 Linux 环境下：

1	logdir = "./callbacks"

而在 Windows 环境下请使用：

1	logdir = "callbacks"

最后不得不说：

Eager Mode, YES!

参考链接

[1] Google, TensorFlow API Module: tf.keras Overview

[2] scikit-learn API: sklearn.preprocessing.StandardScaler

[3] Github Issue, TensorFlow keras callback using tensorboard, “ProfilerNotRunningError: Cannot stop profiling. No profiler is running.” #2279