Timing | MW 3:30 - 5:00 pm |
Location | EE C241 (MMCR 1st Floor) |
Instructor | Sriram Ganapathy |
Office | C 334 (2nd Floor) |
sriram aT ee doT iisc doT ernet doT in | |
Teaching Assistant | Achuth Rao |
Lab | C 326 (2nd Floor) |
achuthraomv aT gmail doT com | |
TA Hours | Thu 3-5 pm |
Announcements
- Final exam date and time Dec 7 1:30pm-4:30pm MMCR
      - Open book, open notes. No laptops/cellphones allowed. - Take home exam-2 posted here -
- Project evaluation Dec 19 930am-1pm MMCR
      - Single person projects (max 10 min presentation) (max 5 slides)
      - Multi person projects - both individuals presenting (max 15 min presentation) (max 8 slides)
      - Project components - Implementation of baseline paper, comparing results with baseline paper, novel directions improving the baseline.
      - Project report (max single column 5 pages) - Due date Dec 17
      - Project slides emailed by Dec 18
      - Mark distribution (Total Marks 30) - Mid-term Evaluation (7), Final presentation (5), Report (5), Baseline implementation (7), Novelty (6).
Top      
Syllabus
- Introduction to real world signals - text, speech, image, video.
- Feature extraction and front-end signal processing - information rich representations, robustness to noise and artifacts, signal enhancement, bio inspired feature extraction.
- Basics of pattern recognition, Generative modeling - Gaussian and mixture Gaussian models, hidden Markov models, factor analysis and latent variable models.
- Discriminative modeling - support vector machines, neural networks and back propagation.
- Introduction to deep learning - convolutional and recurrent networks, pre-training and practical considerations in deep learning, understanding deep networks.
- Clustering methods and decision trees. Feature selection methods.
- Applications in computer vision and speech recognition.
Top      
Grading Details
Assignments | 15% |
Midterm exam. | 20% |
Final exam. | 35% |
Project | 30% |
Pre-requisites
- Random Process/Probablity and Statistics
- Linear Algebra/Matrix Theory
- Basic Digital Signal Processing/Signals and Systems
Top      
Textbooks
- “Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011.
- “Deep Learning”, I. Goodfellow, Y, Bengio, A. Courville, MIT Press, 2016. html
- “Digital Image Processing”, R. C. Gonzalez, R. E. Woods, 3rd Edition, Prentice Hall, 2008.
- “Fundamentals of speech recognition”, L. Rabiner and H. Juang, Prentice Hall, 1993.
References
- “Deep Learning : Methods and Applications”, Li Deng, Microsoft Technical Report.
- “Automatic Speech Recognition - Deep learning approach” - D. Yu, L. Deng, Springer, 2014.
- “Machine Learning for Audio, Image and Video Analysis”, F. Camastra, Vinciarelli, Springer, 2007. pdf
Top      
Slides
03-08-2016 | Introduction to real world signals - text, speech, image, video. Learning as a pattern recognition problem. Examples. Roadmap of the course. |
slides |
08-08-2016 | Types of learning methods, feature extraction for speech and audio, short-term Fourier transform, narrow band and wideband spectrogram, time frequency resolution. Refs - Dan Ellis-Tutorial     Ricardo-Tutorial |
slides |
10-08-2016 | Uncorrelated noise in speech/audio, non-negative matrix factorization (NMF), problem definition, cost function and constraints. auxiliary function, proof of convergence, parameter update rule. Application to audio source separation and speech denoising. Refs - Bhiksha Raj-Tutorial     Lee-Paper |
slides |
17-08-2016 | Linear Prediction - orthogonality of prediction error with past samples, optimal linear predictor, Yule-Walker Equations, Energy of prediction error, stability of prediction filter, Autoregressive process, linear prediction for AR process Ref - Theory of LP - Vaidyanathan [Chap - 2, 5.3, A, B] |
|
22-08-2016 | Normal equations for Autoregressive process. Power spectral density. Autoregressive Modeling of PSD. Applications of linear prediction. First assignment- Non-negative Matrix Factorization, Linear Prediction, Applications for face images and noisy speech. Due Date - 02-09-2016 (Noon) |
slides
HW1.pdf images.zip speech.zip |
24-08-2016 | Matrix derivative rules. Dimensionality Reduction I - Principal component analysis (PCA), maximum variance formulation, minimum error formulation. Whitening and standardization, PCA for high dimensional data. Linear discriminant analysis (LDA), Fisher discriminant for two classes. Ref - PRML - Bishop |
|
29-08-2016 | LDA for multiple classes, LDA formulation in lower dimensional subspace. Applications of PCA. Distinction between PCA and LDA. Introduction to feature extraction from image data - Wavelet transform, mother wavelet, scaling and shifting, Continuous and Dyadic Wavelet Transform. Ref - Introduction to Wavelets and Wavelet Transforms - Burrus et al. |
|
31-08-2016 | Dyadic Wavelet Transform. Scaling and Wavelet Function. Approximation and Detail. Wavelet decomposition. Application to 1-D signals Ref -Selected Pages - Burrus et al (Chap. 2) |
handout |
05-09-2016 | Intrepreting wavelet approximation and detail coefficients. Filter bank approach to Wavelets. Extentsion to 2-D Wavelet Transform, Application to Images Ref - Tutorial on 2-D Wavelets Image Denoising |
handout |
07-09-2016 | Decision Theory - Inference and decision rule, mis-classification error, maximum posterior decision rule, expected loss, minimum mean square error decision rule for regression. Three approaches to inference and decision - Generative modeling, Discriminative modeling and Discriminant Functions. Ref - PRML - Bishop (Sec. 1.5) |
|
12-09-2016 | Introduction to generative modeling. Gaussian distribution. Parameter estimation using maximum likelihood (MLE). Sample mean and sample covariance. Limitations of Gaussian modeling. Gaussian mixture model (GMM) density function.
|
slides |
14-09-2016 | MLE for GMM - Expectation Maximization (EM) algorithm. Proof of EM algorithm. Convergence properties. EM algorithm for GMM parameter estimation. Choice of hidden variable. Application of GMMs for unsupervised data clustering. Ref - Tutorial GMMs Proof of EM algorithm EM algorithm for GMMs |
slides |
19-09-2016 | Markov chain - sequence modeling with hidden Markov modeling (HMM). Definition of HMM parameters. Three problems in HMM (i) Evaluation (ii) Inference and (iii) Training. Direct computation of likelihood. Forward and backward variable recursion.
Ref - Rabiner, Juang, "Fundamentals of speech recognition",Chap 6 Ref - SP Magazine Article - Rabiner |
|
21-09-2016 | Solution to problem (ii) in HMM - Viterbi algorithm. HMM parameter estimation with EM algorithm. Estimation of Q function and iterative model update. Ref - Tutorial HMMs Rabiner, Juang, "Fundamentals of speech recognition",Chap 6 Second assignment (Part A)- PCA/LDA, ML, Gaussian and GMM, HMM Due Date - 03-10-2016 (Class) |
HW2-a.pdf |
26-09-2016 | First Mid-term Exam |
|
28-09-2016 |
Discussion on first mid-term exam. Topics for mini-projects |
Project list |
03-10-2016 | Hidden Markov Models with GMM observation densities. Application of EM algorithm for GMM-HMMs. Parameter estimation. Application of HMMs in video analysis. Dimensionality reduction continued - latent variable models. Refs - GMM-HMM - "Fundamentals of Speech Recognition", Rabiner, Chap 6. Slides from N. Ramanathan -Video analysis with HMMs |
|
05-10-2016 | Probablistic PCA (PPCA) - generative model desciption. Log-likelihood computation, Parameter estimation using direct optimization. EM algorithm for PPCA. Extension to factor analysis. Summary of generative modeling. Introduction to discriminative modeling - Non-linear regression with kernels. Ref - PRML - Bishop (Sec. 12.2, 3.1) Paper - "PPCA", Tipping et al |
|
12-10-2016 | Recap of generative versus discriminative modeling. Non-linear regression with regularization. Dual problem definition and solution with kernels. Properties of kernel functions. Constructing kernels from basic blocks. Sparse kernel machines. Ref - PRML - Bishop (Sec. 3.3, 6) |
slides |
17-10-2016 |
Classifiers with kernels. Definition of margin. Maximum margin classifiers. Introduction to convex optimization with constraints. Primal and dual problems. Weak and strong duality. Karush-Kuhn-Tucker (KKT) conditions for strong duality. Solving the dual problem for maximum margin classifiers. Definition of support vectors.
Ref - PRML - Bishop (Chap 7.1) Book (Chap 5) - "Convex Optimization", Boyd and Vandenberghe Second assignment (Part B)- Implementing PCA/LDA, GMM and HMM Due Date - 28-10-2016 (Noon) |
HW2-b.pdf |
19-10-2016 |
Maximum margin classifiers for overlapping class distributions, concept of slack variables. Lagrangian and dual form. KKT conditions for solving the optimal parameters. Sequential minimal optimization algorithm - analytic solution to two variable constrained optimization problem, heuristics for choosing the two variables. Estimating the bias parameter in SVM.
Ref - PRML - Bishop (Chap 7.1.1) SMO paper - J. Platt et al. |
|
24-10-2016 |
Summary of support vector machines - problem definition, primal and dual formulations, kernel space transformation, solutions and implications, applications of SVMs in cancer diagnonsis and text categorization. Support vector regression - slack variables and dual formulation.
Ref - PRML - Bishop (Chap 7.1.4) NYU Bio medicine - Tutorial |
slides |
31-10-2016 |
Introduction to neural networks. Illustration with XOR problem - need for hidden layer(s) with non-linear activations. Optimization methods for neural networks. First order Taylor series - Gradient descent method. curvature and second derviatives. Jacobian and Hessian matrices. Newton's method. Stochastic gradient descent.
Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6, Chap 4.3) |
|
02-11-2016 |
Neural networks estimate posterior probablities. Architecture considerations - cost function (mean square error, cross entropy), output units (linear, sigmoidal or softmax), hidden unit activations (ReLU and variants, tanh or sigmoidal).
Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6) |
|
07-11-2016 |
Universal approximation properties of NNs. Need for multiple hidden layers. Depth versus width. Mechanism of representation learning in deep networks. Parameter learning in deep networks - back propagation. Equivalence in learning DNNs with linear output activation and MSE versus softmax activations with cross entropy error.
Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6) ASR - DL approach, D. Yu, Li Deng (Chap 4). |
|
09-11-2016 |
Summary of NN learning and architecture. Psuedo code for back propagation. Other considerations - data preprocessing, model initialization. Underfit versus overfit. Improving
generalization with regularization. L2 regularization. Quadratic approximation and
Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 6, 7) |
slides |
14-11-2016 |
L1 regularization. Multi-task learning. Early stoppping. Equivalence between L2 regularization and early stopping. Bagging and ensemble averaging. Dropout. Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 7) |
|
16-11-2016 |
Convolutional neural networks. Filtering and hierarchical sparsity. Pooling and striding. Deep convolutional networks. Discussion on second mid-term exam Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 9) |
|
21-11-2016 |
Deep Generative Models - Restricted Boltzmann Machine, model definition, conditional independence. Relationship with sigmoidal activation.
Parameter learning in RBM - positive and negative phase, approximation with sampling methods, contrastive divergence algorithm.
Deep Belief Networks (DBNs). Ref - DLB (Deep Learning Book) - Goodfellow, Bengio (Chap 18,20) |
|
23-11-2016 |
Gaussian Restricted Boltzmann Machine (GRBM). Relationship with GMMs. Summary of Deep learning methods. Ref - ASR - Deep Learning Approach (Yu and Deng) - (Chap 5) |
slides |
29-11-2016 |
Take Home Practice Exam
|
Q-paper |
                                                                |                 |             |
Top