MLSP Fall 2016

Timing	MW 3:30 - 5:00 pm
Location	EE C241 (MMCR 1st Floor)
Instructor	Sriram Ganapathy
Office	C 334 (2nd Floor)
Email	sriram aT ee doT iisc doT ernet doT in
Teaching Assistant	Achuth Rao
Lab	C 326 (2nd Floor)
Email	achuthraomv aT gmail doT com
TA Hours	Thu 3-5 pm

Announcements

Final exam date and time Dec 7 1:30pm-4:30pm MMCR
- Open book, open notes. No laptops/cellphones allowed.

Take home exam-2 posted here -
Project evaluation Dec 19 930am-1pm MMCR
- Single person projects (max 10 min presentation) (max 5 slides)
- Multi person projects - both individuals presenting (max 15 min presentation) (max 8 slides)
- Project components - Implementation of baseline paper, comparing results with baseline paper, novel directions improving the baseline.
- Project report (max single column 5 pages) - Due date Dec 17
- Project slides emailed by Dec 18
- Mark distribution (Total Marks 30) - Mid-term Evaluation (7), Final presentation (5), Report (5), Baseline implementation (7), Novelty (6).

Top

Syllabus

Introduction to real world signals - text, speech, image, video.
Feature extraction and front-end signal processing - information rich representations, robustness to noise and artifacts, signal enhancement, bio inspired feature extraction.
Basics of pattern recognition, Generative modeling - Gaussian and mixture Gaussian models, hidden Markov models, factor analysis and latent variable models.
Discriminative modeling - support vector machines, neural networks and back propagation.
Introduction to deep learning - convolutional and recurrent networks, pre-training and practical considerations in deep learning, understanding deep networks.
Clustering methods and decision trees. Feature selection methods.
Applications in computer vision and speech recognition.

Top

Grading Details

Assignments	15%
Midterm exam.	20%
Final exam.	35%
Project	30%

Pre-requisites

Random Process/Probablity and Statistics
Linear Algebra/Matrix Theory
Basic Digital Signal Processing/Signals and Systems

Top

Textbooks

“Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011.

“Deep Learning”, I. Goodfellow, Y, Bengio, A. Courville, MIT Press, 2016. html
“Digital Image Processing”, R. C. Gonzalez, R. E. Woods, 3rd Edition, Prentice Hall, 2008.
“Fundamentals of speech recognition”, L. Rabiner and H. Juang, Prentice Hall, 1993.

References

“Deep Learning : Methods and Applications”, Li Deng, Microsoft Technical Report.
“Automatic Speech Recognition - Deep learning approach” - D. Yu, L. Deng, Springer, 2014.
“Machine Learning for Audio, Image and Video Analysis”, F. Camastra, Vinciarelli, Springer, 2007. pdf

Top

Slides

03-08-2016	Introduction to real world signals - text, speech, image, video. Learning as a pattern recognition problem. Examples. Roadmap of the course.	slides
08-08-2016	Types of learning methods, feature extraction for speech and audio, short-term Fourier transform, narrow band and wideband spectrogram, time frequency resolution. Refs - Dan Ellis-Tutorial Ricardo-Tutorial	slides
10-08-2016	Uncorrelated noise in speech/audio, non-negative matrix factorization (NMF), problem definition, cost function and constraints. auxiliary function, proof of convergence, parameter update rule. Application to audio source separation and speech denoising. Refs - Bhiksha Raj-Tutorial Lee-Paper	slides
17-08-2016	Linear Prediction - orthogonality of prediction error with past samples, optimal linear predictor, Yule-Walker Equations, Energy of prediction error, stability of prediction filter, Autoregressive process, linear prediction for AR process Ref - Theory of LP - Vaidyanathan [Chap - 2, 5.3, A, B]
22-08-2016	Normal equations for Autoregressive process. Power spectral density. Autoregressive Modeling of PSD. Applications of linear prediction. First assignment- Non-negative Matrix Factorization, Linear Prediction, Applications for face images and noisy speech. Due Date - 02-09-2016 (Noon)	slides HW1.pdf images.zip speech.zip
24-08-2016	Matrix derivative rules. Dimensionality Reduction I - Principal component analysis (PCA), maximum variance formulation, minimum error formulation. Whitening and standardization, PCA for high dimensional data. Linear discriminant analysis (LDA), Fisher discriminant for two classes. Ref - PRML - Bishop
29-08-2016	LDA for multiple classes, LDA formulation in lower dimensional subspace. Applications of PCA. Distinction between PCA and LDA. Introduction to feature extraction from image data - Wavelet transform, mother wavelet, scaling and shifting, Continuous and Dyadic Wavelet Transform. Ref - Introduction to Wavelets and Wavelet Transforms - Burrus et al.
31-08-2016	Dyadic Wavelet Transform. Scaling and Wavelet Function. Approximation and Detail. Wavelet decomposition. Application to 1-D signals Ref -Selected Pages - Burrus et al (Chap. 2)	handout
05-09-2016	Intrepreting wavelet approximation and detail coefficients. Filter bank approach to Wavelets. Extentsion to 2-D Wavelet Transform, Application to Images Ref - Tutorial on 2-D Wavelets Image Denoising	handout
07-09-2016	Decision Theory - Inference and decision rule, mis-classification error, maximum posterior decision rule, expected loss, minimum mean square error decision rule for regression. Three approaches to inference and decision - Generative modeling, Discriminative modeling and Discriminant Functions. Ref - PRML - Bishop (Sec. 1.5)
12-09-2016	Introduction to generative modeling. Gaussian distribution. Parameter estimation using maximum likelihood (MLE). Sample mean and sample covariance. Limitations of Gaussian modeling. Gaussian mixture model (GMM) density function.	slides
14-09-2016	MLE for GMM - Expectation Maximization (EM) algorithm. Proof of EM algorithm. Convergence properties. EM algorithm for GMM parameter estimation. Choice of hidden variable. Application of GMMs for unsupervised data clustering. Ref - Tutorial GMMs Proof of EM algorithm EM algorithm for GMMs	slides
19-09-2016	Markov chain - sequence modeling with hidden Markov modeling (HMM). Definition of HMM parameters. Three problems in HMM (i) Evaluation (ii) Inference and (iii) Training. Direct computation of likelihood. Forward and backward variable recursion. Ref - Rabiner, Juang, "Fundamentals of speech recognition",Chap 6 Ref - SP Magazine Article - Rabiner
21-09-2016	Solution to problem (ii) in HMM - Viterbi algorithm. HMM parameter estimation with EM algorithm. Estimation of Q function and iterative model update. Ref - Tutorial HMMs Rabiner, Juang, "Fundamentals of speech recognition",Chap 6 Second assignment (Part A)- PCA/LDA, ML, Gaussian and GMM, HMM Due Date - 03-10-2016 (Class)	HW2-a.pdf
26-09-2016	First Mid-term Exam
28-09-2016	Discussion on first mid-term exam. Topics for mini-projects	Project list
03-10-2016	Hidden Markov Models with GMM observation densities. Application of EM algorithm for GMM-HMMs. Parameter estimation. Application of HMMs in video analysis. Dimensionality reduction continued - latent variable models. Refs - GMM-HMM - "Fundamentals of Speech Recognition", Rabiner, Chap 6. Slides from N. Ramanathan -Video analysis with HMMs
05-10-2016	Probablistic PCA (PPCA) - generative model desciption. Log-likelihood computation, Parameter estimation using direct optimization. EM algorithm for PPCA. Extension to factor analysis. Summary of generative modeling. Introduction to discriminative modeling - Non-linear regression with kernels. Ref - PRML - Bishop (Sec. 12.2, 3.1) Paper - "PPCA", Tipping et al
12-10-2016	Recap of generative versus discriminative modeling. Non-linear regression with regularization. Dual problem definition and solution with kernels. Properties of kernel functions. Constructing kernels from basic blocks. Sparse kernel machines. Ref - PRML - Bishop (Sec. 3.3, 6)	slides
17-10-2016	Classifiers with kernels. Definition of margin. Maximum margin classifiers. Introduction to convex optimization with constraints. Primal and dual problems. Weak and strong duality. Karush-Kuhn-Tucker (KKT) conditions for strong duality. Solving the dual problem for maximum margin classifiers. Definition of support vectors. Ref - PRML - Bishop (Chap 7.1) Book (Chap 5) - "Convex Optimization", Boyd and Vandenberghe Second assignment (Part B)- Implementing PCA/LDA, GMM and HMM Due Date - 28-10-2016 (Noon)	HW2-b.pdf
19-10-2016	Maximum margin classifiers for overlapping class distributions, concept of slack variables. Lagrangian and dual form. KKT conditions for solving the optimal parameters. Sequential minimal optimization algorithm - analytic solution to two variable constrained optimization problem, heuristics for choosing the two variables. Estimating the bias parameter in SVM. Ref - PRML - Bishop (Chap 7.1.1) SMO paper - J. Platt et al.
24-10-2016	Summary of support vector machines - problem definition, primal and dual formulations, kernel space transformation, solutions and implications, applications of SVMs in cancer diagnonsis and text categorization. Support vector regression - slack variables and dual formulation. Ref - PRML - Bishop (Chap 7.1.4) NYU Bio medicine - Tutorial	slides
31-10-2016	Introduction to neural networks. Illustration with XOR problem - need for hidden layer(s) with non-linear activations. Optimization methods for neural networks. First order Taylor series - Gradient descent method. curvature and second derviatives. Jacobian and Hessian matrices. Newton's method. Stochastic gradient descent. Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6, Chap 4.3)
02-11-2016	Neural networks estimate posterior probablities. Architecture considerations - cost function (mean square error, cross entropy), output units (linear, sigmoidal or softmax), hidden unit activations (ReLU and variants, tanh or sigmoidal). Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6)
07-11-2016	Universal approximation properties of NNs. Need for multiple hidden layers. Depth versus width. Mechanism of representation learning in deep networks. Parameter learning in deep networks - back propagation. Equivalence in learning DNNs with linear output activation and MSE versus softmax activations with cross entropy error. Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6) ASR - DL approach, D. Yu, Li Deng (Chap 4).
09-11-2016	Summary of NN learning and architecture. Psuedo code for back propagation. Other considerations - data preprocessing, model initialization. Underfit versus overfit. Improving generalization with regularization. L2 regularization. Quadratic approximation and Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6, 7)	slides
14-11-2016	L1 regularization. Multi-task learning. Early stoppping. Equivalence between L2 regularization and early stopping. Bagging and ensemble averaging. Dropout. Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 7)
16-11-2016	Convolutional neural networks. Filtering and hierarchical sparsity. Pooling and striding. Deep convolutional networks. Discussion on second mid-term exam Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 9)
21-11-2016	Deep Generative Models - Restricted Boltzmann Machine, model definition, conditional independence. Relationship with sigmoidal activation. Parameter learning in RBM - positive and negative phase, approximation with sampling methods, contrastive divergence algorithm. Deep Belief Networks (DBNs). Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 18,20)
23-11-2016	Gaussian Restricted Boltzmann Machine (GRBM). Relationship with GMMs. Summary of Deep learning methods. Ref - ASR - Deep Learning Approach (Yu and Deng) - (Chap 5)	slides
29-11-2016	Take Home Practice Exam	Q-paper

Top