Text Mining, 6 credits

Text Mining, 6 hp

TDDE16

Main field of study

Information Technology Computer Science and Engineering Computer Science

Course level

Second cycle

Course type

Programme course

Examiner

Marco Kuhlmann

Director of studies or equivalent

Ann-Charlotte Hallberg

Education components

Preliminary scheduled hours: 0 h
Recommended self-study hours: 160 h
ECV = Elective / Compulsory / Voluntary
Course offered for Semester Period Timetable module Language Campus ECV
6CDDD Computer Science and Engineering, M Sc in Engineering 9 (Autumn 2017) 2 2 English Linköping, Valla E
6CDDD Computer Science and Engineering, M Sc in Engineering (AI and Machine Learning) 9 (Autumn 2017) 2 2 English Linköping, Valla E
6CMJU Computer Science and Software Engineering, M Sc in Engineering 9 (Autumn 2017) 2 2 English Linköping, Valla E
6CMJU Computer Science and Software Engineering, M Sc in Engineering (AI and Machine Learning) 9 (Autumn 2017) 2 2 English Linköping, Valla E
6MDAV Computer Science, Master's programme 3 (Autumn 2017) 2 2 English Linköping, Valla E
6MICS Computer Science, Master's programme 3 (Autumn 2017) 2 2 English Linköping, Valla E
6CITE Information Technology, M Sc in Engineering 9 (Autumn 2017) 2 2 English Linköping, Valla E
6CITE Information Technology, M Sc in Engineering (AI and Machine Learning) 9 (Autumn 2017) 2 2 English Linköping, Valla E

Main field of study

Information Technology, Computer Science and Engineering, Computer Science

Course level

Second cycle

Advancement level

A1X

Course offered for

  • Computer Science and Software Engineering, M Sc in Engineering
  • Computer Science and Engineering, M Sc in Engineering
  • Information Technology, M Sc in Engineering
  • Computer Science, Master's programme

Entry requirements

Note: Admission requirements for non-programme students usually also include admission requirements for the programme and threshold requirements for progression within the programme, or corresponding.

Prerequisites

Mathematical analysis; Linear Algebra; Probability and Statistics; Machine Learning; Basic programming.

Intended learning outcomes

The overall aim of the course is to provide an introduction to quantitative analysis of text, with special focus on applying machine learning methods to text documents. The student will learn all the main steps when working with text: i) efficient extraction of text, ii) natural language processing of the text in a form suitable for iii) statistical machine learning methods which are subsequently used for iv) text prediction.
After completing the course the student should be able to:

  • use basic methods for information extraction and retrieval of textual data.
  • apply text processing techniques to prepare documents for statistical modelling
  • apply relevant machine learning models for analyzing textual data and correctly interpreting the results
  • use machine learning models for text prediction
  • evaluate the performance of machine learning models for textual data

 

Course content

Introduction and overview of quantitative text analysis and its applications. Information extraction. Web crawling. Information retrieval. Tf-idf. Vector space models. Text preprocessing. Bag of words. N-grams. Sparsity and smoothing for text. Document classification. Sentiment analysis. Model evaluation. Topic models.
 

Teaching and working methods

The course consists of lectures, computer laboratory work and an individual project. The lectures introduce concepts and theories that students then use in problem solving at the computer labs and in the project work.

Examination

PRA1Project3 creditsU, 3, 4, 5
LAB1Laboratory exercises3 creditsU, G

UPG1 consists of computer exercises that tests the students' ability to translate theoretical knowledge into practical problem solving in machine learning.
UPG2 is an individual project where the student solves a real-world problem involving text. The project is documented and evaluated by a written project report. 

Grades

Four-grade scale, LiU, U, 3, 4, 5

Department

Institutionen för datavetenskap

Director of Studies or equivalent

Ann-Charlotte Hallberg

Examiner

Marco Kuhlmann

Education components

Preliminary scheduled hours: 0 h
Recommended self-study hours: 160 h
Code Name Scope Grading scale
PRA1 Project 3 credits U, 3, 4, 5
LAB1 Laboratory exercises 3 credits U, G

UPG1 consists of computer exercises that tests the students' ability to translate theoretical knowledge into practical problem solving in machine learning.
UPG2 is an individual project where the student solves a real-world problem involving text. The project is documented and evaluated by a written project report. 

Regulations (apply to LiU in its entirety)

The university is a government agency whose operations are regulated by legislation and ordinances, which include the Higher Education Act and the Higher Education Ordinance. In addition to legislation and ordinances, operations are subject to several policy documents. The Linköping University rule book collects currently valid decisions of a regulatory nature taken by the university board, the vice-chancellor and faculty/department boards.

LiU’s rule book for education at first-cycle and second-cycle levels is available at http://styrdokument.liu.se/Regelsamling/Innehall/Utbildning_pa_grund-_och_avancerad_niva. 

There is no course literature available for this course in studieinfo.

Note: The course matrix might contain more information in Swedish.

I = Introduce, U = Teach, A = Utilize
I U A Modules Comment
1. DISCIPLINARY KNOWLEDGE AND REASONING
1.1 Knowledge of underlying mathematics and science (G1X level)
X
X
X
LAB1
PRA1

                            
1.2 Fundamental engineering knowledge (G1X level)
X
X
X
LAB1
PRA1

                            
1.3 Further knowledge, methods, and tools in one or several subjects in engineering or natural science (G2X level)

                            
1.4 Advanced knowledge, methods, and tools in one or several subjects in engineering or natural sciences (A1X level)

                            
1.5 Insight into current research and development work

                            
2. PERSONAL AND PROFESSIONAL SKILLS AND ATTRIBUTES
2.1 Analytical reasoning and problem solving
X
X
X
LAB1
PRA1

                            
2.2 Experimentation, investigation, and knowledge discovery
X
X
X
LAB1
PRA1

                            
2.3 System thinking
X
X
X
PRA1

                            
2.4 Attitudes, thought, and learning

                            
2.5 Ethics, equity, and other responsibilities

                            
3. INTERPERSONAL SKILLS: TEAMWORK AND COMMUNICATION
3.1 Teamwork
X
PRA1

                            
3.2 Communications
X
PRA1

                            
3.3 Communication in foreign languages

                            
4. CONCEIVING, DESIGNING, IMPLEMENTING AND OPERATING SYSTEMS IN THE ENTERPRISE, SOCIETAL AND ENVIRONMENTAL CONTEXT
4.1 External, societal, and environmental context
X

                            
4.2 Enterprise and business context

                            
4.3 Conceiving, system engineering and management

                            
4.4 Designing
PRA1

                            
4.5 Implementing

                            
4.6 Operating

                            
5. PLANNING, EXECUTION AND PRESENTATION OF RESEARCH DEVELOPMENT PROJECTS WITH RESPECT TO SCIENTIFIC AND SOCIETAL NEEDS AND REQUIREMENTS
5.1 Societal conditions, including economic, social, and ecological aspects of sustainable development for knowledge development

                            
5.2 Economic conditions for knowledge development

                            
5.3 Identification of needs, structuring and planning of research or development projects

                            
5.4 Execution of research or development projects

                            
5.5 Presentation and evaluation of research or development projects

                            

This tab contains public material from the course room in Lisam. The information published here is not legally binding, such material can be found under the other tabs on this page.

There are no files available for this course.