Module 6: Big Data and Machine Learning


Big Data in Finance

  • What is Data science?
  • Supervised and unsupervised learning
  • Structured and unstructured data
  • Introduction to Classification
  • Bayesian Models and inference using Markov chain Monte-Carlo
  • Introduction to graphical models: Bayesian networks, Markov networks, inference in graphical models
  • Optimisation techniques
  • Examples: Predictive analytics/trading & Pricing

Classification, Clustering and filtering

  • Classification: K-nearest neighbours, optimal Bayes classifier, naïve Bayes, LDA and QDA, reduced rank LDA, Logistic regression, Support Vector Machines
  • Cluster analysis: BIRCH, Hierarchical, K-mean, Expectation-maximization, DBSCAN, OPTICS and Mean-shift
  • Kalman filtering
  • Examples (2 worked practical examples)

Machine Learning & Predictive Analytics

  • Regression: liner regression, bias-variance decomposition, subset selection, shrinkage methods, regression in high dimensions
  • Support Vectors Machines: Classification and regression using SVM’s and kernel methods
  • Dimension reduction: Principal component analysis (PCA), kernel PCA, non-negative matrix decomposition, PageRank
  • Examples (2 worked examples)

Machine Learning and Data Lab

  • Sandbox: conda, environments, Python and R packages, MLib. Data sources
  • Logistic regression as a classifier: loss function, transition probabilities, softmax and appropriate penalty (Ridge regression)
  • Crossvalidation: samples selection and reshuffling. Precision and recall. Is the classifier random?
  • Support Vector Machines: hyperplane intuition, soft vs hard margin. Choice of kernel to tackle non-linear problems
  • Random Forest Classifiers: regression versions of Decision Trees and AdaBoost
  • Vignettes on neural nets to predict market returns, probabilistic programming, and Markov-switching GARCH

Co-Integration using R

  • Multivariate time series analysis
  • Financial time series: stationary and unit root
  • Vector Autoregression, a theory-free model
  • Equilibrium and Error Correction Model
  • Eagle-Granger Procedure
  • Cointegrating relationships and their rank
  • Estimation of reduced rank regression: Johansen Procedure
  • Stochastic modelling of equilibrium: Orstein-Uhlenbeck process
  • Statistical arbitrage using mean reversion

From Zero to AI

  • Machine learning methodologies and techniques
  • Supervised classification and prediction
  • Unsupervised feature identification
  • Sequence prediction and computer vision

Statistical Methods for Data Analysis

  • Learning and linear models
  • Linear and multiple linear regression
  • Inference
  • Key assumptions: Linearity; IID random error; Independence of the predictors
  • Diagnostics tests: how to troubleshoot your model
  • Pitfalls in predictions: Confidence interval vs prediction interval; Selection bias; Linear locally, nonlinear globally
  • Beyond linear models
  • Regularization: Ridge regression and the Lasso; Cross validation
Lecture order and content may occasionally change due to circumstances beyond our control; however this will never affect the quality of the program.