In this notebook, we leverage the natural data compression provided by tensor decomposition in order to reduce the number of parameters in a trained convolutional neural network. Specifically, by decomposing the weight tensors associated with each layer of ResNet-50, we find a network that is less than 7.2% the size of the original yet achieves 99.1% of the original accuracy. Possible benefits of this method include reduced power and memory requirements and faster inference. This technique would allow for state-of-the-art networks to be successfully deployed on the edge. Our application of this technique is demonstrated using Keras, a high-level API used to interface with TensorFlow.
import os
import sys
import logging
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
logging.disable(sys.maxsize)
import numpy as np
from ensign.cp_decomp import cp_als, write_cp_decomp_dir
import tensorflow as tf
from tensorflow.keras import datasets, utils, applications, models, layers, activations, optimizers, callbacks
import tensorflow_model_optimization as tfmot
MEM_LIMIT_GB = 100
In order to use ENSIGN to decompose convolutional layers, the ndarray
representions of the weights need to be converted to the ENSIGN tensor format for decomposition. We define decompose_kernel
, a wrapper for ensign.cp_decomp.cp_als()
, that takes weights as an ndarray
, constructs an ENSIGN tensor, and decomposes it using CP-ALS. We choose this decomposition algorithm, which assumes tensor entries are distributed according to Gaussian distributions, because it is best suited for continuous-valued data.
In order to apply the decomposed convolutions to incoming feature maps, the decomposed weight tensor should be used to set the weights of the decomposed layer architecture. The specific decomposed layer used here, implemented in factorized_conv
, turns an n-dimensional convolution into a sequence of n+2 1D convolutions. Our implementation takes a decomposition object and the original layer parameters as inputs and infers the appropriate sequence of 1D convolutions.
This technique is derived and discussed in more detail in Speeding-up Convolutional Neural Networks Using Fine-tuned CP-decomposition. The figures below depict the factorization of a 2D convolution into a sequence of four 1D convolutions. In the simplified standard 2D convolution mapping S input channels to T output channels, there are S filters for each output channels. Then for a kernal size of d, the tensor storing the layer weights has size d x d x S x T.
After decomposing the tensor and substituting the form of the decomposition for the kernel in the layer application rule, the following sequence of 1D convolutions is derived. The factor matrices contain the weights for the 1D convolutions.
The compression results from the fact that the factor matrices describing the 1D convolutions have fewer parameters than the original tensor for appropriate choices of the rank. The original layer has d x d x S x T parameters while the decomposed layer has R x (d + d + S + T) parameters where R is the rank. For example, consider a convolutional layer with a 3 x 3 kernels mapping 32 input channels to 64 output channels. The original layer has 3 x 3 x 32 x 64 = 18,432 parameters. Decomposing the layer with rank 16 results in 16 x (3 + 3 + 32 + 64) = 1,632 parameters.
def decompose_kernel(weights, rank, cutoff=1e-8, max_iter=100, directory=None):
'''
Wrapper for cp_als. Takes convolution kernel, creates sptensor file, decomposes, and cleans up.
'''
shape = weights.shape
tensor = []
prods = [np.prod(shape[i:]) for i in range(len(shape))] + [1]
for i in range(prods[0]):
idx = ()
for d in prods[1:]:
n = i // d
idx += (n,)
i -= n * d
value = weights[idx]
if np.abs(value) > cutoff:
tensor.append(list(idx) + [value])
out = open('tensor_data.txt', 'w')
out.write('sptensor\n' + str(len(shape)) + '\n' + ' '.join(map(str, shape)) + '\n' + str(len(tensor)) + '\n')
for entry in tensor:
out.write(' '.join(map(str, entry)) + '\n')
out.close()
decomp = cp_als('./tensor_data.txt', rank, max_iter=max_iter, mem_limit_gb=MEM_LIMIT_GB)
if directory:
write_cp_decomp_dir(directory, decomp, True)
os.remove('tensor_data.txt')
return decomp
def factorized_conv(decomp, input_shape, strides, padding, bias, set_weights=True):
'''
Takes a decomposition of a convolutional kernel and returns a factorized layer
'''
rank = decomp.rank
n_dims = decomp.order - 2
factors = decomp.factors
weights = decomp.weights
input_layer = layers.Input(shape=input_shape)
x = layers.Conv1D(filters=factors[-2].shape[1], kernel_size=1, use_bias=False)(input_layer)
for i in range(n_dims):
permute = list(range(1, n_dims + 2))
d = permute.pop(i)
permute.insert(-1, d)
x = layers.Permute(tuple(permute))(x)
x = layers.Conv1D(filters=factors[i].shape[1],
kernel_size=factors[i].shape[0],
strides=strides[i],
padding=padding,
groups=rank,
use_bias=False)(x)
permute = list(range(1, n_dims + 2))
d = permute.pop(-2)
permute.insert(i, d)
x = layers.Permute(tuple(permute))(x)
x = layers.Conv1D(filters=factors[-1].shape[0], kernel_size=1, use_bias=True)(x)
fact_conv = models.Model(inputs=[input_layer], outputs=[x])
if set_weights:
fact_conv.layers[1].set_weights([np.expand_dims(factors[-2], axis=0)])
for i in range(n_dims):
fact_conv.layers[3 + 3 * i].set_weights([np.expand_dims(factors[i], axis=1)])
fact_conv.layers[-1].set_weights([np.expand_dims((factors[-1]*weights).T, axis=0), bias])
return fact_conv
The original network must be modified to replace layers with their decomposed counterparts. In order to handle arbitrary network archictures, including non-sequential models like ResNet-50 that contain skip connections, we implement replace_layers
. This "network surgery" tool inspects the input and output nodes of each layer to reconstruct the model's computational graph with the decomposed layers substituted for the originals.
def replace_layers(model, layers_to_replace, fact_convs):
'''
Takes a model and decomposed layers and returns a new model with those layers substituted
'''
if type(layers_to_replace) != list:
layers_to_replace = [layers_to_replace]
if type(fact_convs) != list:
fact_convs = [fact_convs]
node_inputs_map = {} # map from input nodes to the names of those nodes
layer_outputs = {} # map from layer names to functional outputs
for l in model.layers:
for n in l.outbound_nodes:
if n not in node_inputs_map:
node_inputs_map[n] = [l.name]
else:
node_inputs_map[n].append(l.name)
# input layer
x = model.layers[0].output
layer_outputs[model.layers[0].name] = x
# other layers
for i, l in enumerate(model.layers[1:], start=1):
assert len(l.inbound_nodes) == 1
inputs = [layer_outputs[name] for name in node_inputs_map[l.inbound_nodes[0]]]
inputs = inputs[0] if len(inputs) == 1 else inputs
if i in layers_to_replace:
idx = layers_to_replace.index(i)
x = fact_convs[idx](inputs)
else:
x = l(inputs)
layer_outputs[l.name] = x
fact_model = models.Model(inputs=[model.layers[0].input], outputs=[x])
for l in fact_model.layers:
l.trainable = True
return fact_model
Finally, we implement factorized_cnn
to handle the coordination between decomposing weight tensors, constructing decomposed layers, and performing network surgery. This function takes a convolutional network and either a rank for decomposing the layers or a desired compression rate and returns a convolutional network with each layer factorized.
def compute_rank(w, b, r):
return int((r * (np.prod(w.shape) + b.shape[0]) - b.shape[0]) / np.sum(w.shape))
def factorized_cnn(model, rank=0.5, layers_to_factorize=None, directory=None, set_weights=True):
'''
Takes a model and indices of layers to decompose and returns a model with those layers factorized
'''
if not layers_to_factorize:
layers_to_factorize = []
for i, l in enumerate(model.layers):
if type(l) in [layers.Conv1D, layers.Conv2D, layers.Conv3D]:
layers_to_factorize.append(i)
elif type(layers_to_factorize) != list:
layers_to_factorize = [layers_to_factorize]
fact_convs = []
ranks = []
for i in layers_to_factorize:
layer = model.layers[i]
weights, bias = layer.get_weights()
r = rank if type(rank) == int else compute_rank(weights, bias, rank)
layer_dir = '{}/layer_{}'.format(directory, i) if directory else None
max_iter = 100 if set_weights else 1
decomp = decompose_kernel(weights, r, max_iter=max_iter, directory=layer_dir)
fact_conv = factorized_conv(decomp,
layer.input_shape[1:],
layer.strides,
layer.padding,
bias,
set_weights)
fact_convs.append(fact_conv)
fact_model = replace_layers(model, layers_to_factorize, fact_convs)
return fact_model
Our specific application is to decompose each layer of ResNet-50 trained on the CIFAR-10 dataset. The resulting network is 7.2% the size of the original. Yet after fine-tuning, it achieves 99.1% of the original accuracy.
We decompose ResNet-50 trained on CIFAR-10 image data. The network technically consists of two parts: the feature extractor and the classifier. We decompose the feature extractor, which consists of convolutional layers.
''' Model '''
model_dir = 'resnet_cifar'
model = models.load_model(model_dir)
resnet = model.layers[0]
''' Data '''
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
x_train = applications.resnet50.preprocess_input(x_train)
x_test = applications.resnet50.preprocess_input(x_test)
y_train = utils.to_categorical(y_train, 10)
y_test = utils.to_categorical(y_test, 10)
k = int(0.7 * len(x_test))
x_val = x_test[:k]
y_val = y_test[:k]
x_test = x_test[k:]
y_test = y_test[k:]
The model achieves accuracy of 79.4% on the test data and the feature extractor has 23,587,712 parameters. These numbers are the baseline and serve as the point of comparison for the factorized model.
_, base_acc = model.evaluate(x_test, y_test)
base_n_param = resnet.count_params()
print('Parameters: {}'.format(base_n_param))
94/94 [==============================] - 3s 9ms/step - loss: 0.6994 - accuracy: 0.7937 Parameters: 23587712
We decompose each layer in ResNet-50 with rank 32, then the classifier used in the original model is reused by the decomposed feature extractor. The lower the rank, the greater the compression. We found that decomposing each layer with 32 components resulted in a good tradeoff between compression and accuracy.
%%capture
conv_layers = []
for i, l in enumerate(resnet.layers):
if type(l) == layers.Conv2D:
conv_layers.append(i)
rank = 32
fact_resnet = factorized_cnn(resnet, rank, conv_layers,
directory='{}_fact_data_all'.format(model_dir))
fact_input = layers.Input(shape=(32, 32, 3,))
x = fact_resnet(fact_input)
for l in model.layers[1:]: # re-use classifier for decomposed ResNet
x = l(x)
fact_model = models.Model(inputs=[fact_input], outputs=[x])
for l in fact_model.layers:
l.trainable = True
Finally, we fine-tune the decomposed model to achieve higher test accuracy by performing end-to-end training.
%%capture
fact_model.compile(optimizer=optimizers.Adam(learning_rate=0.0001),
loss='categorical_crossentropy',
metrics=['acc'])
history = fact_model.fit(x_train, y_train,
epochs=50, batch_size=128,
validation_data=(x_val, y_val))
directory = 'fact_network'
fact_model.save(directory)
The decomposed model achieves accuracy of 78.7% on the test data and the feature extractor has 1,709,536 parameters. So the decomposed model is 7.2% the size of the original while achieving nearly 99.1% of its accuracy.
_, acc = fact_model.evaluate(x_test, y_test)
n_param = fact_resnet.count_params()
print('Parameters: {}'.format(n_param))
94/94 [==============================] - 2s 14ms/step - loss: 1.2780 - acc: 0.7867 Parameters: 1709536
n_param / base_n_param, acc / base_acc
(0.0724757026031181, 0.9911802147824977)
Following the work of Speeding-up Convolutional Neural Networks Using Fine-tuned CP-decomposition, we applied the compression technique to a much larger network and replicated the quality of results. We anticipate that applying this technique will enable the deployment of state-of-the-art architectures on the edge.
The natural representation of neural network layers as tensors allows for tensor decomposition to compress the number of parameters in each layer. The resulting decomposed layers require both less memory and less computation than their undecomposed counterparts. While we demonstrate the technique on convolutional layers, analogous implementations are possible for dense and recurrent layers as well as transformers. Future effort will be put toward factorizing models used for natural language processing, such as BERT.
It is important to note that this technique, which operates on the architecture of the neural network, is orthogonal to any compiler optimizations used to accelerate inference. In fact, these types of techniques may complement one another.
Additional future work will involve using pruning to sparsify the compressed network, further reducing the number of parameters.