When  MW 3:30  5:00 pm 
Where  EE B303 (Second class onwards) 
Who  Sriram Ganapathy 
Office  C 334 (2nd Floor) 
sriram aT ee doT iisc doT ernet doT in  
Teaching Assistant  Aravind Illa 
Lab  C 326 (2nd Floor) 
aravindece77 aT gmail doT com 
Announcements
 Final Exam will be on 10122017 (0200pm0500pm) B303 (Classroom).
 Open book, open notes. No laptops/cellphones allowed.
 Practice questions posted here  Project Evaluation 15122017 (900am).
 Maximum 8 slides (single person) or 12 slides (2 persons) per project.
 Maximum 34 pages report (submit report and slides by noon Dec. 14 through mail).
 Evaluation criteria: Focus on problem definition and motivation, implementing the baseline and your contribution.
 Feedback Form link
 Fifth assignment
 Posted here Due  24112017.
Syllabus
 Introduction to real world signals  text, speech, image, video.
 Feature extraction and frontend signal processing  information rich representations, robustness to noise and artifacts, signal enhancement, bio inspired feature extraction.
 Basics of pattern recognition, Generative modeling  Gaussian and mixture Gaussian models, hidden Markov models, factor analysis.
 Discriminative modeling  support vector machines, neural networks and back propagation.
 Introduction to deep learning  convolutional and recurrent networks, pretraining and practical considerations in deep learning, understanding deep networks.
 Deep generative models  Autoencoders, Boltzmann machines, Adverserial Networks.
 Applications in computer vision and speech recognition.
Grading Details
Assignments  15% 
Midterm exam.  20% 
Final exam.  35% 
Project  30% 
Prerequisites
 Random Process/Probablity and Statistics
 Linear Algebra/Matrix Theory
 Basic Digital Signal Processing/Signals and Systems
Textbooks
 “Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011.
 “Neural Networks”, C.M. Bishop, Oxford Press, 1995.
 “Deep Learning”, I. Goodfellow, Y, Bengio, A. Courville, MIT Press, 2016. html
 “Digital Image Processing”, R. C. Gonzalez, R. E. Woods, 3rd Edition, Prentice Hall, 2008.
 “Fundamentals of speech recognition”, L. Rabiner and H. Juang, Prentice Hall, 1993.
References
 “Deep Learning : Methods and Applications”, Li Deng, Microsoft Technical Report.
 “Automatic Speech Recognition  Deep learning approach”  D. Yu, L. Deng, Springer, 2014.
 “Machine Learning for Audio, Image and Video Analysis”, F. Camastra, Vinciarelli, Springer, 2007. pdf
Slides
14082017  Introduction to real world signals  text, speech, image, video. Learning as a pattern recognition problem. Examples. Roadmap of the course. 
slides 

16082017  Feature Extraction  Goals and challenges. Introduction to text
processing. Bag of words model. Term Frequency Inverse document
frequency. Ngram modeling. Feature Extraction in Audio and Speech 
Spectrogram. 
slides 

21082017  Melfrequency cepstral Coefficients (MFCC), Linear Prediction  orthogonality of prediction error with past samples, optimal linear predictor, stability of prediction filter, Autoregressive process, linear prediction for AR process 
slides 

23082017  Basics for Digital Image Processing – Filtering, Smoothing, Edge Detection, Scale Invariant Feature Transform (SIFT). 
slides 

28082017  Matrix and vector derivatives  definition and properties. Dimensionality reduction  Preserving maximum data variance  principal component analysis (PCA). Minimum error formulation of PCA. Residual error in PCA. Example of PCA application for handwritten digit images. PRML  Bishop (Appendix, Chapter 12) 
slides 

30082017  PCA for high dimensional data. Whitening and KL transform. Limitations of PCA. Class dependent dimensionality reduction using linear discriminant analysis (LDA). Fisher discriminant for 2 class case using withinclass and between class matrices. Solution of LDA. Multiclass LDA, PCA versus LDA example. PRML  Bishop (Chapter 4.1.4) 
slides 

01092017  Basics of Python Programming. Installing python, simple commands and functions. Loading speech and image data. Vectorizing, mean computation and spectogram. 
slides code 

01092017  Assignment #1. Due on 11092017. Analytical part submitted in class. Coding part submitted via e9205mlsp2017 aT gmail doT com. 
HW1 
image data 
speech data 
04092017  Decision theory basics. Minimum classification error rule. MAP and ML based approaches. 3 approaches to ML. Generative versus discriminative modeling. Introduction to generative modeling. Multivariate Gaussian Distribution. PRML  Bishop (Chapter 1.5) 
slides 

06092017 
MLE for multivariate Gaussian. Sample mean and variance. Limitations of Gaussian modeling. Need for mixture modeling. Probability density of Gaussian Mixture Model (GMM).

slides Future Reading 

11092017  MLE for GMM  Expectation Maximization (EM) algorithm. Proof of EM algorithm. Convergence properties. EM algorithm for GMM parameter estimation. Choice of hidden variable. Ref  Tutorial GMMs Proof of EM algorithm EM algorithm for GMMs 
slides  
13092017  Summary of GMM modeling. Application of GMM for unsupervised clustering. 
slides  
18092017  Limitations of GMM modeling for sequence data. Markov Chains. Hidden Markov Model (HMM) definition. Three Problems in HMM. "Fundamentals of Speech Recog.", Rabiner and Juang (Chapter 6) 

20092017  Evaluating the likelihood using HMM (Problem 1), Complexity reduction using forward variable and backward variable. Finding the best state sequence (Problem 2)  instantaneous probabilility based, Viterbi algorithm for state sequence segmentation. "Fundamentals of Speech Recog.", Rabiner and Juang (Chapter 6) 
Rabiner Tutorial on HMM  
23092017  Reestimating the HMM parameters  EM algorithm for HMM (Problem 3). Q function definition and solution. Intuitions about HMM training. "Fundamentals of Speech Recog.", Rabiner and Juang (Chapter 6) EM algorithm for HMMs 
Rabiner Tutorial on HMM  
25092016  Nonnegative matrix factorization (NMF), problem definition, cost function and constraints. auxiliary function, proof of convergence, parameter update rule. Application to audio source separation and speech denoising. Refs  Bhiksha RajTutorial LeePaper 
slides  
04102017  First Midterm Exam 

09102017  Application of NMF. Audio separation into individual instruments, speech denoising with known and unknown sources. Linear models for regression  problem definition. Least squares regression. Maximum likelihood and least squares regression. PRML  Bishop (Chapter 2) 

11102017  Overfitting and Underfitting. Regularized least squares. Linear Models for Classification. Least squares for classification. Sigmoid function and oneofK encoding. Problems with least squares classification. PRML  Bishop (Chapter 3) 
slides 

16102017  Logistic regression  two class problem. Sigmoid function and posterior probability. Logistic regression  K class problem. Softmax function and cross entropy error function and Maximum likelihood estimation. Linear regression revisited  dual formulation.
PRML  Bishop (Chapter 4,6) 

21102017  Design matrix, kernel function and Gram matrix. Neccessary and sufficient condition for kernel functions (Mercer's theorem), Examples of kernel functions.
PRML  Bishop (Chapter 6) 

23102017  Margin of linear classifier. Maximum margin classifier formulation. Constraints involved in optimization. Introduction to support vector machines
PRML  Bishop (Chapter 7) 

25102017  Introduction to constrained optimization. Primal and dual problems. Weak and strong duality. Neccessary and sufficient conditions for strong duality for convex problems with convex conditions. KKT conditions. Introduction to convex optimization  Boyd (Chapter 5) 
Weblink to the book


27102017  Application of convex optimization to SVMs. KKT conditions and solution to problem. Definition of support vectors. Support vector machine for overlapping classes. Trade off in regularization and training loss.
PRML  Bishop (Chapter 7) 
slides 

30102017  SVM appication for classification. Support vector regression. Forumulation and KKT conditions.
Introduction to neural networks. Parameter learning using gradient descent (scalar case).
PRML  Bishop (Chapter 7) 
slides 

3112017  Gradient descent vector case. Types of activation functions. XOR problem with NNs. Need for deep architecture neural networks.
Deep Learning  IY (Chapter 6) 

4112017  Learning in Neural networks. First order methods  Method of steepest descent. Curvature and Hessians. Second order method  Newton method. Discussion on complexity of learning algorithms
Deep Learning  IY (Chapter 4), Neural Networks  Bishop (Chapter 4,7) 

6112017  Back progation algorithm for learning in deep networks. Linear neuron with MSE algorithm. Disadvantages and limitations of gradient descent algorithm Neural Networks  Bishop (Chapter 6) 

08112017  Second Midterm Exam.


10112017  Types of nonlinearities used. Cost function for regression and classification. Output activation function used in regression and classification. Equivalance between regression with MSE and classficiation with CE using softmax output activations. Neural Networks  Bishop (Chapter 6) 

13112017  Learning and Generarilzation issues in Neural networks. Decomposing the MSE into bias and variance. Discussion on bias variance tradeoff. Improving learning with regularization. Neural Networks  Bishop (Chapter 9) 
slides 

14112017  Assignment #5. Due on 24112017. Analytical part submitted in class. Coding part submitted via e9205mlsp2017 aT gmail doT com. 
HW5 
Data For HW5 

15112017  L2 weight regularization, early stopping and training with added noise in the input data. Committees of neural networks. System combination methods and optimization. Neural Networks  Bishop (Chapter 9) 
slides 

17112017  Improving the speed of convergence of gradient descent with momentum. Convolutional neural networks. Kernels, pooling and subsampling. Comparision of CNNs and DNNs. Weight sharing and parameter learning. Deep Learning  IY (Chapter 9) 

18112017  Understanding the learning in deep layers of CNNs. Recurrent networks. Backpropagation in time for RNN parameter learning. Various RNN architectures  teacher forcing, sequencetovector and bidirectional RNNs. Deep Learning  IY (Chapter 10) 
slides 

20112017 
Long short term memory networks. Deep unsupervised learning  Restricted Boltzmann Machines (RBMs). Conditional independence in RBMs. Learning in RBM with maximum likelihood. Postive and negative partition function. Gibbs sampling and contrastive divergence approximations.
Deep Learning  IY (Chapter 18,20) 
slides 
