本文約4500字,建議閱讀10分鐘
本篇文章我們將關注 PixelCNNs 的最大限制之一(即盲點)以及如何改進以修復它。
盲點
Gated PixelCNN
class MaskedConv2D(keras.layers.Layer): """Convolutional layers with masks. Convolutional layers with simple implementation of masks type A and B for autoregressive models. Arguments: mask_type: one of `"A"` or `"B".` filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution). kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions. strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any `dilation_rate` value != 1. padding: one of `"valid"` or `"same"` (case-insensitive). kernel_initializer: Initializer for the `kernel` weights matrix. bias_initializer: Initializer for the bias vector. """ def __init__(self, mask_type, filters, kernel_size, strides=1, padding='same', kernel_initializer='glorot_uniform', bias_initializer='zeros'): super(MaskedConv2D, self).__init__() assert mask_type in {'A', 'B', 'V'} self.mask_type = mask_type self.filters = filters self.kernel_size = kernel_size self.strides = strides self.padding = padding.upper() self.kernel_initializer = initializers.get(kernel_initializer) self.bias_initializer = initializers.get(bias_initializer) def build(self, input_shape): kernel_h = self.kernel_size kernel_w = self.kernel_size self.kernel = self.add_weight('kernel', shape=(self.kernel_size, self.kernel_size, int(input_shape[-1]), self.filters), initializer=self.kernel_initializer, trainable=True) self.bias = self.add_weight('bias', shape=(self.filters,), initializer=self.bias_initializer, trainable=True) mask = np.ones(self.kernel.shape, dtype=np.float64) # Get centre of the filter for even or odd dimensions if kernel_h % 2 != 0: center_h = kernel_h // 2 else: center_h = (kernel_h - 1) // 2 if kernel_w % 2 != 0: center_w = kernel_w // 2 else: center_w = (kernel_w - 1) // 2 if self.mask_type == 'V': mask[center_h + 1:, :, :, :] = 0. else: mask[:center_h, :, :] = 0. mask[center_h, center_w + (self.mask_type == 'B'):, :, :] = 0. mask[center_h + 1:, :, :] = 0. self.mask = tf.constant(mask, dtype=tf.float64, name='mask') def call(self, input): masked_kernel = tf.math.multiply(self.mask, self.kernel) x = nn.conv2d(input, masked_kernel, strides=[1, self.strides, self.strides, 1], padding=self.padding) x = nn.bias_add(x, self.bias) return x
門控激活單元(或門控塊)
Gated PixelCNN 中的單層塊
class GatedBlock(tf.keras.Model): """ Gated block that compose Gated PixelCNN.""" def __init__(self, mask_type, filters, kernel_size): super(GatedBlock, self).__init__(name='') self.mask_type = mask_type self.vertical_conv = MaskedConv2D(mask_type='V', filters=2 * filters, kernel_size=kernel_size) self.horizontal_conv = MaskedConv2D(mask_type=mask_type, filters=2 * filters, kernel_size=kernel_size) self.padding = keras.layers.ZeroPadding2D(padding=((1, 0), 0)) self.cropping = keras.layers.Cropping2D(cropping=((0, 1), 0)) self.v_to_h_conv = keras.layers.Conv2D(filters=2 * filters, kernel_size=1) self.horizontal_output = keras.layers.Conv2D(filters=filters, kernel_size=1) def _gate(self, x): tanh_preactivation, sigmoid_preactivation = tf.split(x, 2, axis=-1) return tf.nn.tanh(tanh_preactivation) * tf.nn.sigmoid(sigmoid_preactivation) def call(self, input_tensor): v = input_tensor[0] h = input_tensor[1] vertical_preactivation = self.vertical_conv(v) # Shifting vertical stack feature map down before feed into horizontal stack to # ensure causality v_to_h = self.padding(vertical_preactivation) v_to_h = self.cropping(v_to_h) v_to_h = self.v_to_h_conv(v_to_h) horizontal_preactivation = self.horizontal_conv(h) v_out = self._gate(vertical_preactivation) horizontal_preactivation = horizontal_preactivation + v_to_h h_activated = self._gate(horizontal_preactivation) h_activated = self.horizontal_output(h_activated) if self.mask_type == 'A': h_out = h_activated elif self.mask_type == 'B': h_out = h + h_activated return v_out, h_out
綜上所述,利用門控塊解決了接收域盲點問題,提高了模型性能。
結果對比
原始論文中,PixelCNN使用以下架構:第一層是一個帶有7x7過濾器的掩碼卷積(a型)。然後使用15個殘差塊。每個塊採用掩碼B類的3x3層卷積層和標準1x1卷積層的組合處理數據。在每個卷積層之間,使用ReLU進行激活。最後還包括一些殘差鏈接。
所以我們訓練了一個PixelCNN和一個Gated PixelCNN,並在下面比較結果。
當比較PixelCNN和Gated PixelCNN的MNIST預測時(上圖),我們並沒有發現到在MNIST上有很大的改進。一些先前被修正的預測數字現在被錯誤地預測了。這並不意味着pixelcnn表現不好,在下一篇博文中,我們將討論帶門控的PixelCNN++,所以請繼續關注!
