Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks which use backpropagation method for calculation of derivatives. We construct an analytical framework to estimate a contribution of each training example to the norm of the long-term components of the target functions gradient and use it to hold the norm of the gradients in the suitable range. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. To check our framework experimentally we use some special synthetic benchmarks for testing RNNs on ability to capture long-term dependencies. Our network can detect links between events in the (temporal) sequence at the range 100 and longer.
CITATION STYLE
Chernodub, A., & Nowicki, D. (2016). Sampling-based gradient regularization for capturing long-term dependencies in recurrent neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9948 LNCS, pp. 90–97). Springer Verlag. https://doi.org/10.1007/978-3-319-46672-9_11
Mendeley helps you to discover research relevant for your work.