Skip to main content
Pages and Files
CS department homepage
CS faculty meetings
CS talks and visitors
From Communities of Practice to Community Based Decision Making — Case Ericsson
Artificial General Intelligence and AI in Games
Cryptography Reading Group
Cybersecurity Breakfast Talks
Data Systems Group
DeIC offer Sep 2017
Former members of the department (partial list)
Jean Melo PhD
PhD Defense of Jean Melo
Remotely controlled drug delivery with chemical micro-robots.
State of Low-power Wireless Protocols for IoT
Talk Alan Mycroft 9 June 2017
Talk by Alexander Serebrenik on Aug 31, 2017
TALK by Rohit Gheyi on Aug 31, 2017
Add "All Pages"
Talk Yasuo Tabei, March 6, 2017
Scalable partial least squares regression on grammar-compressed data matrices
Japan Science and Technology Agency
Mon 6 Mar 2017 at 11:00-11:45
With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this talk we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The original data matrix is grammar compressed and then the linear model in PLS is learned on the compressed data matrix, which results in a significant reduction in working space, greatly improving scalability. We experimentally test cPLS on its ability to learn linear models for classification, regression and feature extraction with various massive high-dimensional data, and show that cPLS performs superiorly in terms of prediction accuracy, computational efficiency, and interpretability.
help on how to format text
Turn off "Getting Started"