MLSP Fall 2018

When	MW 3:45 - 5:15 pm S 8:30 - 10:00 am (Tentative)
Where	EE B308
Who	Sriram Ganapathy
Office	C 334 (2nd Floor)
Email	sriramg aT iisc doT ac doT in
Teaching Assistant	Akshara Soman
Lab	C 328 (2nd Floor)
Email	aksharas aT iisc doT ac doT in

Announcements

Class Location will be B-303 EE (Note the Change)

Course Enrollment Form - Please fill up in case you interested in audit/credit here
Fifth Assignment
- Posted here Due on 23-11-2018.
Second Mid-Term Exam Date and Time Nov. 9 400 pm B308 Open Book, Open Notes.
Project Midterm Review Nov. 17 10 am
- Presentation of 5 minutes for individual projects and 10minutes for group projects.
- 2 slides on literature survey, 2 slides on progress thus far and 1 slide on future plan of the work.

Top

Syllabus

Introduction to real world signals - text, speech, image, video.
Feature extraction and front-end signal processing - information rich representations, robustness to noise and artifacts.
Basics of pattern recognition, Generative modeling - Gaussian and mixture Gaussian models, hidden Markov models.
Discriminative modeling - support vector machines, neural networks and back propagation.
Introduction to deep learning - convolutional and recurrent networks, pre-training and practical considerations in deep learning, understanding deep networks.
Deep generative models - Autoencoders, Boltzmann machines, Adverserial Networks, Variational Learning.
Applications in NLP, computer vision and speech recognition.

Top

Grading Details

Assignments	15%
Midterm exam.	20%
Final exam.	35%
Project	30%

Pre-requisites

Must - Random Process/Probablity and Statistics
Must - Linear Algebra/Matrix Theory
Preferred - Basic Digital Signal Processing/Signals and Systems

Top

Textbooks

“Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011.

“Neural Networks”, C.M. Bishop, Oxford Press, 1995.

“Deep Learning”, I. Goodfellow, Y, Bengio, A. Courville, MIT Press, 2016. html
“Fundamentals of speech recognition”, L. Rabiner and H. Juang, Prentice Hall, 1993.

References

“Deep Learning : Methods and Applications”, Li Deng, Microsoft Technical Report.
“Automatic Speech Recognition - Deep learning approach” - D. Yu, L. Deng, Springer, 2014.
“Machine Learning for Audio, Image and Video Analysis”, F. Camastra, Vinciarelli, Springer, 2007. pdf
Various Published Papers and Online Material
Python Programming Basics pdf

Top

Slides

06-08-2018	Introduction to real world signals - text, speech, image, video. Learning as a pattern recognition problem. Examples. Roadmap of the course.	slides
08-08-2018	Basics of Natural Language Processing - token, document and corpus. TF-IDF features. Language modeling. Smoothing and back-off. Introduction to audio signal processing. DFT, STFT. Information Extraction Book (Chapter 6) - TF-IDF Stanford Reading Material - Language Modeling	Weblink Weblink
13-08-2018	Revisiting text processing. Perplexity. Short-term Fourier Transform considerations. Time-frequency resolution. Mel-frequency cepstral coefficient (MFCC) features. Image processing - filtering, convolutions. Matrix derivatives. Columbia Univ. STFT Tutorial PRML - Bishop (Appendix, Chapter 12)	slides Weblink
20-08-2018	Unsupervised dimensionality reduction using Principal Component Analysis. Maximum variance formulation. Solution using eigenvectors of data covariance matrix. Minimum error formulation. Whitening and standardization. PCA for high dimensional data. PRML - Bishop (Chapter 12.1)
27-08-2018	Supervised dimensionality reduction using linear discriminant analysis (LDA). Fisher discriminant. Solution for 2 class LDA. Multi-class LDA. Comparison between PCA and LDA. Introduction to basics of decision theory. Inference and decision problems. Prior, likelihood and posterior. Maximum-a-posteriori decision rule for two class example. PRML - Bishop (Chapter 4.1.4) PRML - Bishop (Chapter 1.5)	slides
29-08-2018	Decision theory for regression. MMSE estimation. Multi-variate Gaussian Modeling. Intrepretation of Covariance. Diagonal and Full Covariance. Maximum Likelihood estimation of mean and covariance. PRML - Bishop (Chapter 1.6) Further Reading
30-09-2018	Assignment #1. Due on 10-09-2018. Analytical part submitted in class. Coding part submitted via mlsp18.iisc aT gmail doT com.	HW1	image data	speech data
31-08-2018	Short-comings of single Gaussian modeling. Introduction to mixture Gaussian modeling. Properties and parameters. Expectation Maximization algorithm - auxillary Function, proof of conververgence. Ref - Tutorial on GMMs Proof of EM algorithm	slides
10-09-2018	Expectation Maximization Algorithm for GMMs. Initiatialization using K-means. Other Considerations in GMMs. GMM example for unsupervised clustering. Ref - EM algorithm for GMMs	slides
12-09-2018	Limitations of GMM modeling for sequence data. Markov Chains. Hidden Markov Model (HMM) definition. Three Problems in HMM. Evaluating the likelihood using HMM (Problem 1), Complexity reduction using forward variable and backward variable. "Fundamentals of Speech Recog.", Rabiner and Juang (Chapter 6) Ref - Rabiner Tutorial on HMMs	Rabiner Slides on HMM
13-09-2018	Assignment #2. Analytical part submitted in class (26-09-2018). Coding part submitted via mlsp18.iisc aT gmail doT com (28-09-2018)..	HW2	speech-music-data
17-09-2018	Assignment #1 Discussion.
19-09-2018	Infering the best state alignment - Viterbi algorithm for HMM (Solution to Prob. II). Training of HMM using EM algorithm - Baum Welch Algorithm (Solution to Prob. III). "Fundamentals of Speech Recog.", Rabiner and Juang (Chapter 6) Ref - Rabiner Tutorial on HMMs
21-09-2018	EM algorithm for HMM with GMM state distribution. Dealing with multiple observation sequences. Implementation issues in HMM. Applications of HMMs - action recognition, face emotion tracking "Fundamentals of Speech Recog.", Rabiner and Juang (Chapter 6) Ref - Video analysis with HMMs
24-09-2018	Probablistic PCA. Problem formulation - generative model of the data. EM algorithm for parameter estimation. Application of PPCA. PRML - Bishop (Chapter 12.2)	slides
01-10-2018	Regularized linear Regression revisited - dual problem formulation. Gram Matrix. Kernel functions. Examples PRML - Bishop (Chapter 6)
03-10-2018	Mid-term Exam
10-10-2018	Properties of kernel functions. Rules for constructing kernels. The RBF kernel. Maximum margin classifiers - problem formulation for linearly separable case. Optimization fundamentals - primal and dual problems, strong duality, KKT conditions. Application of KKT conditions to maximum margin classifiers. PRML - Bishop (Chapter 7) Introduction to convex optimization - Boyd (Chapter 5) Weblink to the book
12-10-2018	Assignment #3. Analytical part submitted in class and Coding part submitted via mlsp18.iisc aT gmail doT com (22-10-2018)..	HW3
12-10-2018	Maximum margin classfiers - overlapping class distribution. Slack variables. Primal and Dual formualation. KKT conditions. Applications of SVM for text classification, cancer detection, MNIST. PRML - Bishop (Chapter 7)	slides SVM-Application-slides
15-10-2018	Support vector regression - primal and dual, KKT conditions. Introduction to artificial neural networks - extension of kernel machines. Perceptron model. Learning rule in perceptron. Multi-layer perceptron PRML - Bishop (Chapter 7) NNPR - Bishop (Chapter 3,4)
17-10-2018	Forward pass in MLP. Backpropagation algorithm - recursion. Choice of hidden layer activation function. NNPR - Bishop (Chapter 4)
22-10-2018	Computational complexity in Gradient Descent. Definition of Jacobian and Hessian matrices. Approximation to Hessian matrix computation. Choice of error function. Mean square and conditional expectation. Conditional expectation for classification with one-hot-encoding. Neural networks estimate posterior probablities. NNPR - Bishop (Chapter 6)
24-10-2018	Assignment #3. Analytical part submitted in class and Coding part submitted via mlsp18.iisc aT gmail doT com (22-10-2018)..	HW4
24-10-2018	Cross entropy for two class. Expected cross entropy loss and posterior probability estimation. General condition on error function for outputs to be posterior probability. Weight learning - gradient descent method. Properties of gradient descent using quadratic approximation. Learning rate parameter. NNPR - Bishop (Chapter 6,7)	slides
29-10-2018	Drawbacks of gradient descent. Momentum in learning. Nesterov Accelerated gradient. Second order learning methods. Approximate Hessian. Data preprocessing for Neural networks. Decomposing the error into bias and variance. NNPR - Bishop (Chapter 7,9)
31-10-2018	Improving generalization in deep learning. Regularization - weight decay. Impact of regularization on weight update. Early stopping. Training with noise. Committee of Neural networks. Need for deep architectures. NNPR - Bishop (Chapter 9)	slides
02-11-2018	Convolutional Neural Networks, Computation of convolutions. Number of parameters. Advantages over deep neural networks. Pooling and subsampling. Backpropagation in convolution. Insights in deep convolutional networks. Deep Learning Book - Goodfellow et al. (Chapter 9)	slides
05-11-2018	Recurrent Neural Networks. Backprogation in time. Problem of vanishing gradients. Long short term memory networks. Various RNN architectures and applications. Deep Learning Book - Goodfellow et al. (Chapter 10)	slides
09-11-2018	Mid-term Exam 2
12-11-2018	Deep generative modeling - Restricted Boltzmann Machines (RBMs). Conditional independence property. RBM parameter learning. Positive and negative phase of learning. Intuitions behind contrastive divergence algorithm Deep Learning Book - Goodfellow et al. (Chapter 18,19,20)	slides
14-11-2018	Assignment #5. Analytical part and Coding part submitted via mlsp18.iisc aT gmail doT com (23-11-2018)..	HW5
14-11-2018	Autoencoders. Denoising AE. Variational autoencoders. Variational lower bound derivation. KL divergence derivation. Data generation with VAEs Kingma's paper link VAE Tutorial link	slides
19-11-2018	Generative Adversarial Nets (GANs), Attention Networks. Summary of MLSP course.	slides

Top