[ Information ] [ Publications ] [Signal processing codes] [ Signal & Image Links ]  
[ Main blog: A fortunate hive ] [ Blog: Information CLAde ] [ Personal links ]  
[ SIVA Conferences ] [ Other conference links ] [ Journal rankings ]  
[ Tutorial on 2D wavelets ] [ WITS: Where is the starlet? ]  
If you cannot find anything more, look for something else (Bridget Fountain) 


WITÇ = Where Is The ÇonvNet?Gilbert Groçon est connu comme l'inventeur de la cédille. Car nous parlons ici d'intelligence artificielle, de Deep Learning, pardon, d'apprentissage profond NET is the new LET! Once, wavelets and sisters were a leading trend. Now, NETS have become a gold standard. Let us name them 

[AlexNet] [BridgeNet] [CayleyNet] [ChebNet] [ConvNet] [dasNet] [DeConvNet] [DenseNet] [DeScatterNet] [DCFNet] [FitNet] [GoogLeNet] [ImageNet] [LeNet] [MobileNet] [ResNet] [ScatNet/ScatterNet] [ShuffleNet] [SparseNet] [SplineNet] [SqueezeNet] [TasNet] [VGGNet] [UNet] [WideResidualNet] [YedroudjNet] [ZFNet] 
We trained a large, deep convolutional neural network to classify the 1.3 million highresolution images in the LSVRC2010 ImageNet training set into the 1000 different classes. On the test data, we achieved top1 and top5 error rates of 39.7% and 18.9% which is considerably better than the previous stateoftheart results. The neural network, which has 60 million parameters and 500,000 neurons, consists of five convolutional layers, some of which are followed by maxpooling layers, and two globally connected layers with a final 1000way softmax. To make training faster, we used nonsaturating neurons and a very efficient GPU implementation of convolutional nets. To reduce overfitting in the globally connected layers we employed a new regularization method that proved to be very effective.
Despite the remarkable progress achieved on automatic speech recognition, recognizing farfield speeches mixed with various noise sources is still a challenging task. In this paper, we introduce novel studentteacher transfer learning, BridgeNet which can provide a solution to improve distant speech recognition. There are two key features in BridgeNet. First, BridgeNet extends traditional studentteacher frameworks by providing multiple hints from a teacher network. Hints are not limited to the soft labels from a teacher network. Teacher's intermediate feature representations can better guide a student network to learn how to denoise or dereverberate noisy input. Second, the proposed recursive architecture in the BridgeNet can iteratively improve denoising and recognition performance. The experimental results of BridgeNet showed significant improvements in tackling the distant speech recognition problem, where it achieved up to 13.24% relative WER reductions on AMI corpus compared to a baseline neural network without teacher's hints.
The rise of graphstructured data such as social networks, regulatory networks, citation graphs, and functional brain networks, in combination with resounding success of deep learning in various applications, has brought the interest in generalizing deep learning models to nonEuclidean domains. In this paper, we introduce a new spectral domain convolutional architecture for deep learning on graphs. The core ingredient of our model is a new class of parametric rational complex functions (Cayley polynomials) allowing to efficiently compute localized regular filters on graphs that specialize on frequency bands of interest. Our model scales linearly with the size of the input data for sparselyconnected graphs, can handle different constructions of Laplacian operators, and typically requires less parameters than previous models. Extensive experimental results show the superior performance of our approach on various graph learning problems.
In this work, we are interested in generalizing convolutional neural networks (CNNs) from lowdimensional regular grids, where image, video and speech are represented, to highdimensional irregular domains, such as social networks, brain connectomes or words' embedding, represented by graphs. We present a formulation of CNNs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs. Importantly, the proposed technique offers the same linear computational complexity and constant learning complexity as classical CNNs, while being universal to any graph structure. Experiments on MNIST and 20NEWS demonstrate the ability of this novel deep learning system to learn local, stationary, and compositional features on graphs.
In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feedforward artificial neural networks that has successfully been applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their sharedweights architecture and translation invariance characteristics.
Traditional convolutional neural networks (CNN) are stationary and feedforward. They neither change their parameters during evaluation nor use feedback from higher to lower layers. Real brains, however, do. So does our Deep Attention Selective Network (dasNet) architecture. DasNets feedback structure can dynamically alter its convolutional filter sensitivities during classification. It harnesses the power of sequential processing to improve classification performance, by allowing the network to iteratively focus its internal attention on some of its convolutional filters. Feedback is trained through direct policy search in a huge milliondimensional parameter space, through scalable natural evolution strategies (SNES). On the CIFAR10 and CIFAR100 datasets, dasNet outperforms the previous stateoftheart model.
We propose a novel semantic segmentation algorithm by learning a deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixelwise class labels and predict segmentation masks. We apply the trained network to each proposal in an input image, and construct the final semantic segmentation map by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposalwise prediction; our segmentation method typically identifies detailed structures and handles objects in multiple scales naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best accuracy (72.5%) among the methods trained with no external data through ensemble with the fully convolutional network.
This paper constructs translation invariant operators on L2(R^d), which are Lipschitz continuous to the action of diffeomorphisms. A scattering propagator is a path ordered product of nonlinear and noncommuting operators, each of which computes the modulus of a wavelet transform. A local integration defines a windowed scattering transform, which is proved to be Lipschitz continuous to the action of diffeomorphisms. As the window size increases, it converges to a wavelet scattering transform which is translation invariant. Scattering coefficients also provide representations of stationary processes. Expected values depend upon high order moments and can discriminate processes having the same power spectrum. Scattering operators are extended on L2 (G), where G is a compact Lie group, and are invariant under the action of G. Combining a scattering on L2(R^d) and on Ld (SO(d)) defines a translation and rotation invariant scattering on L2(R^d).
Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, \etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for realworld applications. On the contrary, lightweight architectures, such as SqueezeNet, are being proposed to address this issue. However, they mainly suffer from low accuracy, as they have compromised between the processing power and efficiency. These inefficiencies mostly stem from following an adhoc designing procedure. In this work, we discuss and propose several crucial design principles for an efficient architecture design and elaborate intuitions concerning different aspects of the design procedure. Furthermore, we introduce a new layer called {\it SAFpooling} to improve the generalization power of the network while keeping it simple by choosing best features. Based on such principles, we propose a simple architecture called {\it SimpNet}. We empirically show that SimpNet provides a good tradeoff between the computation/memory efficiency and the accuracy solely based on these primitive but crucial principles. SimpNet outperforms the deeper and more complex architectures such as VGGNet, ResNet, WideResidualNet \etc, on several wellknown benchmarks, while having 2 to 25 times fewer number of parameters and operations. We obtain stateoftheart results (in terms of a balance between the accuracy and the number of involved parameters) on standard datasets, such as CIFAR10, CIFAR100, MNIST and SVHN. The implementations are available at https://github.com/Coderx7/SimpNet.
Robust speech processing in multitalker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in realtime, short latency applications. Most methods attempt to construct a mask for each source in timefrequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, timefrequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Timedomain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the timedomain using encoderdecoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on encoder outputs which is then synthesized by the decoder. Our system outperforms the current stateoftheart causal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output. This makes TasNet suitable for applications where lowpower, realtime implementation is desirable such as in hearable and telecommunication devices.
For about 10 years, detecting the presence of a secret message hidden in an image was performed with an Ensemble Classifier trained with Rich features. In recent years, studies such as Xu et al. have indicated that welldesigned convolutional Neural Networks (CNN) can achieve comparable performance to the twostep machine learning approaches. In this paper, we propose a CNN that outperforms the stateoftheart in terms of error probability. The proposition is in the continuity of what has been recently proposed and it is a clever fusion of important bricks used in various papers. Among the essential parts of the CNN, one can cite the use of a preprocessing filterbank and a Truncation activation function, five convolutional layers with a Batch Normalization associated with a Scale Layer, as well as the use of a sufficiently sized fully connected section. An augmented database has also been used to improve the training of the CNN. Our CNN was experimentally evaluated against SUNIWARD and WOW embedding algorithms and its performances were compared with those of three other methods: an Ensemble Classifier plus a Rich Model, and two other CNN steganalyzers.