Mere decades ago, statisticians endeavored to produce reliable inferences from small data sets. With the widespread availability of computers and data storage, however, individuals, governments and businesses now generate immense quantities of data, in some cases more than can be effectively stored and analyzed. There is so much data today that the world’s data centres account for over one per cent of global power use.
Why we need both humans and computers to make data useful
Big data can be immensely valuable for predicting human needs and behaviour, refining products and for understanding global phenomena such as climate change. Yet, without statistical cunning, it is hard to separate the signal from the noise. To make the data useful, computer scientists and statisticians have developed sophisticated techniques for machine learning and data mining that parse and shape the data to answer specific questions about how things are and predict how things will be.
Machine learning concerns the development of new algorithms that enable computers to automatically learn how to solve complex tasks from data, like classifying images or driving a car. Data mining concerns the development of tools that enable people to ask and answer questions from massive quantities of data, ultimately to empower smart, informed decisions.
Helping undergraduates get an early start on data literacy
Historically, students only gained the knowledge and skills required for such high-level programming and data analysis in graduate programs, which means the demand for these skills far outstrips the supply of those who have them. UTSC’s Department of Computer and Mathematical Sciences is set to meet the demand, offering Canada’s first undergraduate program in statistics with a stream dedicated to machine learning and data mining.
Beyond teaching the basics, the cutting edge program will introduce students to the latest research and the latest developments in the industry. The department recently hired Daniel Roy, an expert in machine learning and the emerging area of probabilistic programming, and Ashton Anderson, an expert in devising ways to glean valuable information that predicts the behaviour of the users of massive social networks.
“Tremendous demand for students who have these skills”
As Professor and Chair David Fleet explains, his multidisciplinary department is uniquely suited to provide such a program, bringing together the exact mix of expertise required to crunch the data: statisticians, mathematicians and computer scientists. As the first cohort of graduates emerge, industry, governments and graduate schools will inevitably snap them up.
Fleet sees “a demand for a new generation of young professionals who are extremely well schooled in machine learning and data mining, and who will push the envelope of what we can do with big data. We believe there will be tremendous demand for students who have these skills.” The department’s other stream in quantitative finance has already proven immensely popular with students.
The Statistical Machine Learning and Data Mining stream will be in high demand from students. But for those lucky enough to get in, there is no telling what challenges their skills will address as they draw sharp signals out of the noise of big data.