As the analysis of gathered data is playing an increasing role in LA, for this year’s workshop we will offer the first joint activity on the prediction of student performance by analysing reading patterns from logs of an e-book system.

A dataset of anonymised reading log data is provided below to create models that can predict the final grade scores for each student. Participants will be encouraged to share their results and insights by submitting a paper for presentation at the workshop


By downloading our dataset and using our dataset you have agreed to our Terms of Use.

The dataset for this joint activity, in which you will predict the performance of students in two different courses (data1 and data2), includes these 4 files:

data#_clickstream.csv (2 files), which contain the logged activity data from students' interactions with the BookRoll system

data#_score.csv (2 files), which contains the final score for each student. This should be used as the label for training and testing prediction models.

For a more description of the columns, please refer to the README file in the dataset download.

A link to download the dataset will be provided after your contact information has been registered and agreement with the terms of use have been met.

Register my contact information and download dataset.

For more information about BookRoll and the learning analytics platform on which the data was collected, please refer to the following:

  • Hiroaki Ogata, Chengjiu Yin, Misato Oi, Fumiya Okubo, Atsushi Shimada, Kentaro Kojima, and Masanori Yamada, E-Book-based learning analytics in university education, Proceedings of the 23rd International Conference on Computer in Education (ICCE 2015) pp.401-406, 2015.
  • Digital teaching material delivery system "BookRoll"
  • Brendan Flanagan, Hiroaki Ogata, Integration of Learning Analytics Research and Production Systems While Protecting Privacy, Proceedings of the 25th International Conference on Computers in Education (ICCE2017), pp.333-338, 2017.
  • Hiroaki Ogata, Misato Oi, Kousuke Mohri, Fumiya Okubo, Atsushi Shimada, Masanori Yamada, Jingyun Wang, and Sachio Hirokawa, Learning Analytics for E-Book-Based Educational Big Data in Higher Education, In Smart Sensors at the IoT Frontier, pp.327-350, Springer, Cham, 2017.

Recommended Evaluation Method

We recommend the use of the following metrics when evaluating models that predict student performance: AUC and RMSE. The evaluation of models should be calculated by taking the average of 3-fold cross-validation that have been run 10 times with partitions selected randomly for each run.

Terms related to looking at anonymized data.

While the LA@ICCE2018 Workshop will give anonymized data to researchers, sometimes it may be possible to link the data that was intended to be anonymized back to individuals.

You agree, as a condition of using the LA@ICCE2018 datasets, to the following terms and conditions meant to ensure that student data remains anonymous:

  1. The dataset is only to be used for the purpose of this workshop which is to support educational and learning activities.
  2. After the workshop has finished, the dataset is to be deleted from all systems.
  3. You will not use the data to discover personally identifiable information about the individual students in the study.
  4. If you discover something that can identify students personally, you will both delete it from your computer, and inform the LA@ICCE2018 Workshop Organizer (Brendan Flanagan - laicce2018 [at] gmail [dot] com) of this immediately. You will work with the the LA@ICCE2018 Workshop Organizer to take steps to make sure data that is supposed to be anonymous is in fact anonymous.
  5. You agree to not give this data to a third party.
  6. You agree to not commercialize this data or use it in a malicious manner.