Timing  MW 3:30  5:00 pm 
Location  EE C241 (MMCR 1st Floor) 
Instructor  Sriram Ganapathy 
Office  C 334 (2nd Floor) 
sriram aT ee doT iisc doT ernet doT in  
Teaching Assistant  Achuth Rao 
Lab  C 326 (2nd Floor) 
achuthraomv aT gmail doT com  
TA Hours  Thu 35 pm 
Announcements
 Final exam date and time Dec 7 1:30pm4:30pm MMCR
 Open book, open notes. No laptops/cellphones allowed.  Take home exam2 posted here 
 Project evaluation Dec 19 930am1pm MMCR
 Single person projects (max 10 min presentation) (max 5 slides)
 Multi person projects  both individuals presenting (max 15 min presentation) (max 8 slides)
 Project components  Implementation of baseline paper, comparing results with baseline paper, novel directions improving the baseline.
 Project report (max single column 5 pages)  Due date Dec 17
 Project slides emailed by Dec 18
 Mark distribution (Total Marks 30)  Midterm Evaluation (7), Final presentation (5), Report (5), Baseline implementation (7), Novelty (6).
Syllabus
 Introduction to real world signals  text, speech, image, video.
 Feature extraction and frontend signal processing  information rich representations, robustness to noise and artifacts, signal enhancement, bio inspired feature extraction.
 Basics of pattern recognition, Generative modeling  Gaussian and mixture Gaussian models, hidden Markov models, factor analysis and latent variable models.
 Discriminative modeling  support vector machines, neural networks and back propagation.
 Introduction to deep learning  convolutional and recurrent networks, pretraining and practical considerations in deep learning, understanding deep networks.
 Clustering methods and decision trees. Feature selection methods.
 Applications in computer vision and speech recognition.
Grading Details
Assignments  15% 
Midterm exam.  20% 
Final exam.  35% 
Project  30% 
Prerequisites
 Random Process/Probablity and Statistics
 Linear Algebra/Matrix Theory
 Basic Digital Signal Processing/Signals and Systems
Textbooks
 “Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011.
 “Deep Learning”, I. Goodfellow, Y, Bengio, A. Courville, MIT Press, 2016. html
 “Digital Image Processing”, R. C. Gonzalez, R. E. Woods, 3rd Edition, Prentice Hall, 2008.
 “Fundamentals of speech recognition”, L. Rabiner and H. Juang, Prentice Hall, 1993.
References
 “Deep Learning : Methods and Applications”, Li Deng, Microsoft Technical Report.
 “Automatic Speech Recognition  Deep learning approach”  D. Yu, L. Deng, Springer, 2014.
 “Machine Learning for Audio, Image and Video Analysis”, F. Camastra, Vinciarelli, Springer, 2007. pdf
Slides
03082016  Introduction to real world signals  text, speech, image, video. Learning as a pattern recognition problem. Examples. Roadmap of the course. 
slides 
08082016  Types of learning methods, feature extraction for speech and audio, shortterm Fourier transform, narrow band and wideband spectrogram, time frequency resolution. Refs  Dan EllisTutorial RicardoTutorial 
slides 
10082016  Uncorrelated noise in speech/audio, nonnegative matrix factorization (NMF), problem definition, cost function and constraints. auxiliary function, proof of convergence, parameter update rule. Application to audio source separation and speech denoising. Refs  Bhiksha RajTutorial LeePaper 
slides 
17082016  Linear Prediction  orthogonality of prediction error with past samples, optimal linear predictor, YuleWalker Equations, Energy of prediction error, stability of prediction filter, Autoregressive process, linear prediction for AR process Ref  Theory of LP  Vaidyanathan [Chap  2, 5.3, A, B] 

22082016  Normal equations for Autoregressive process. Power spectral density. Autoregressive Modeling of PSD. Applications of linear prediction. First assignment Nonnegative Matrix Factorization, Linear Prediction, Applications for face images and noisy speech. Due Date  02092016 (Noon) 
slides
HW1.pdf images.zip speech.zip 
24082016  Matrix derivative rules. Dimensionality Reduction I  Principal component analysis (PCA), maximum variance formulation, minimum error formulation. Whitening and standardization, PCA for high dimensional data. Linear discriminant analysis (LDA), Fisher discriminant for two classes. Ref  PRML  Bishop 

29082016  LDA for multiple classes, LDA formulation in lower dimensional subspace. Applications of PCA. Distinction between PCA and LDA. Introduction to feature extraction from image data  Wavelet transform, mother wavelet, scaling and shifting, Continuous and Dyadic Wavelet Transform. Ref  Introduction to Wavelets and Wavelet Transforms  Burrus et al. 

31082016  Dyadic Wavelet Transform. Scaling and Wavelet Function. Approximation and Detail. Wavelet decomposition. Application to 1D signals Ref Selected Pages  Burrus et al (Chap. 2) 
handout 
05092016  Intrepreting wavelet approximation and detail coefficients. Filter bank approach to Wavelets. Extentsion to 2D Wavelet Transform, Application to Images Ref  Tutorial on 2D Wavelets Image Denoising 
handout 
07092016  Decision Theory  Inference and decision rule, misclassification error, maximum posterior decision rule, expected loss, minimum mean square error decision rule for regression. Three approaches to inference and decision  Generative modeling, Discriminative modeling and Discriminant Functions. Ref  PRML  Bishop (Sec. 1.5) 

12092016  Introduction to generative modeling. Gaussian distribution. Parameter estimation using maximum likelihood (MLE). Sample mean and sample covariance. Limitations of Gaussian modeling. Gaussian mixture model (GMM) density function.

slides 
14092016  MLE for GMM  Expectation Maximization (EM) algorithm. Proof of EM algorithm. Convergence properties. EM algorithm for GMM parameter estimation. Choice of hidden variable. Application of GMMs for unsupervised data clustering. Ref  Tutorial GMMs Proof of EM algorithm EM algorithm for GMMs 
slides 
19092016  Markov chain  sequence modeling with hidden Markov modeling (HMM). Definition of HMM parameters. Three problems in HMM (i) Evaluation (ii) Inference and (iii) Training. Direct computation of likelihood. Forward and backward variable recursion.
Ref  Rabiner, Juang, "Fundamentals of speech recognition",Chap 6 Ref  SP Magazine Article  Rabiner 

21092016  Solution to problem (ii) in HMM  Viterbi algorithm. HMM parameter estimation with EM algorithm. Estimation of Q function and iterative model update. Ref  Tutorial HMMs Rabiner, Juang, "Fundamentals of speech recognition",Chap 6 Second assignment (Part A) PCA/LDA, ML, Gaussian and GMM, HMM Due Date  03102016 (Class) 
HW2a.pdf 
26092016  First Midterm Exam 

28092016 
Discussion on first midterm exam. Topics for miniprojects 
Project list 
03102016  Hidden Markov Models with GMM observation densities. Application of EM algorithm for GMMHMMs. Parameter estimation. Application of HMMs in video analysis. Dimensionality reduction continued  latent variable models. Refs  GMMHMM  "Fundamentals of Speech Recognition", Rabiner, Chap 6. Slides from N. Ramanathan Video analysis with HMMs 

05102016  Probablistic PCA (PPCA)  generative model desciption. Loglikelihood computation, Parameter estimation using direct optimization. EM algorithm for PPCA. Extension to factor analysis. Summary of generative modeling. Introduction to discriminative modeling  Nonlinear regression with kernels. Ref  PRML  Bishop (Sec. 12.2, 3.1) Paper  "PPCA", Tipping et al 

12102016  Recap of generative versus discriminative modeling. Nonlinear regression with regularization. Dual problem definition and solution with kernels. Properties of kernel functions. Constructing kernels from basic blocks. Sparse kernel machines. Ref  PRML  Bishop (Sec. 3.3, 6) 
slides 
17102016 
Classifiers with kernels. Definition of margin. Maximum margin classifiers. Introduction to convex optimization with constraints. Primal and dual problems. Weak and strong duality. KarushKuhnTucker (KKT) conditions for strong duality. Solving the dual problem for maximum margin classifiers. Definition of support vectors.
Ref  PRML  Bishop (Chap 7.1) Book (Chap 5)  "Convex Optimization", Boyd and Vandenberghe Second assignment (Part B) Implementing PCA/LDA, GMM and HMM Due Date  28102016 (Noon) 
HW2b.pdf 
19102016 
Maximum margin classifiers for overlapping class distributions, concept of slack variables. Lagrangian and dual form. KKT conditions for solving the optimal parameters. Sequential minimal optimization algorithm  analytic solution to two variable constrained optimization problem, heuristics for choosing the two variables. Estimating the bias parameter in SVM.
Ref  PRML  Bishop (Chap 7.1.1) SMO paper  J. Platt et al. 

24102016 
Summary of support vector machines  problem definition, primal and dual formulations, kernel space transformation, solutions and implications, applications of SVMs in cancer diagnonsis and text categorization. Support vector regression  slack variables and dual formulation.
Ref  PRML  Bishop (Chap 7.1.4) NYU Bio medicine  Tutorial 
slides 
31102016 
Introduction to neural networks. Illustration with XOR problem  need for hidden layer(s) with nonlinear activations. Optimization methods for neural networks. First order Taylor series  Gradient descent method. curvature and second derviatives. Jacobian and Hessian matrices. Newton's method. Stochastic gradient descent.
Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 6, Chap 4.3) 

02112016 
Neural networks estimate posterior probablities. Architecture considerations  cost function (mean square error, cross entropy), output units (linear, sigmoidal or softmax), hidden unit activations (ReLU and variants, tanh or sigmoidal).
Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 6) 

07112016 
Universal approximation properties of NNs. Need for multiple hidden layers. Depth versus width. Mechanism of representation learning in deep networks. Parameter learning in deep networks  back propagation. Equivalence in learning DNNs with linear output activation and MSE versus softmax activations with cross entropy error.
Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 6) ASR  DL approach, D. Yu, Li Deng (Chap 4). 

09112016 
Summary of NN learning and architecture. Psuedo code for back propagation. Other considerations  data preprocessing, model initialization. Underfit versus overfit. Improving
generalization with regularization. L2 regularization. Quadratic approximation and
Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 6, 7) 
slides 
14112016 
L1 regularization. Multitask learning. Early stoppping. Equivalence between L2 regularization and early stopping. Bagging and ensemble averaging. Dropout. Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 7) 

16112016 
Convolutional neural networks. Filtering and hierarchical sparsity. Pooling and striding. Deep convolutional networks. Discussion on second midterm exam Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 9) 

21112016 
Deep Generative Models  Restricted Boltzmann Machine, model definition, conditional independence. Relationship with sigmoidal activation.
Parameter learning in RBM  positive and negative phase, approximation with sampling methods, contrastive divergence algorithm.
Deep Belief Networks (DBNs). Ref  DLB (Deep Learning Book)  Goodfellow, Bengio (Chap 18,20) 

23112016 
Gaussian Restricted Boltzmann Machine (GRBM). Relationship with GMMs. Summary of Deep learning methods. Ref  ASR  Deep Learning Approach (Yu and Deng)  (Chap 5) 
slides 
29112016 
Take Home Practice Exam

Qpaper 