Sampling-based gradient regularization for capturing long-term dependencies in recurrent neural networks

Artem Chernodub; Dimitri Nowicki

Conference Proceedings

Sampling-based gradient regularization for capturing long-term dependencies in recurrent neural networks

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9948 LNCS 90-97

DOI: 10.1007/978-3-319-46672-9_11

2Citations

16Readers

Get full text

Abstract

Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks which use backpropagation method for calculation of derivatives. We construct an analytical framework to estimate a contribution of each training example to the norm of the long-term components of the target functions gradient and use it to hold the norm of the gradients in the suitable range. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. To check our framework experimentally we use some special synthetic benchmarks for testing RNNs on ability to capture long-term dependencies. Our network can detect links between events in the (temporal) sequence at the range 100 and longer.

Cite

CITATION STYLE

APA

Chernodub, A., & Nowicki, D. (2016). Sampling-based gradient regularization for capturing long-term dependencies in recurrent neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9948 LNCS, pp. 90–97). Springer Verlag. https://doi.org/10.1007/978-3-319-46672-9_11

Sampling-based gradient regularization for capturing long-term dependencies in recurrent neural networks

Abstract

Cite

Register to see more suggestions