Te Kete Ipurangi Navigation:

Te Kete Ipurangi

Te Kete Ipurangi user options:

You are here:

Cleaning and formatting data

When working with data to analyse results and draw conclusions, it is essential that the data with which you are working is ‘clean’. This means that it is consistent, accurate and complete. Refer below for more information on each of these categories.

The person working with the data should be alert to any anomalies, either within the numerical data itself or the demographic information attached to it. For example, a student number may be missing, or a cell has no information, or a year level for one student is incorrect. Making sure that data is clean before you start to work with it will help prevent misinterpretations, or having to go through the process again if you discover problems further down the track.


It is essential that each data record is consistent with others. If you are downloading information from more than one source, it must be in exactly the same format, so you must be sure before you combine data (for example, records for more than one class) that the format is the same.

Data which shows progress over time should always be matched, so that the same students are represented in both sets of figures.


The data must be accurate. Scan all data for anomalies – are you sure you have the right student records, the right test results? If the data is entered manually, data entry should be double-checked for accuracy.


Each data record should be complete. Make sure that there are no student records that have no result against them. Although it’s important to find out why some students do not have assessment results, their records for the purposes of immediate analysis should be deleted. This is particularly important when calculating medians and means, as empty records can skew the data.

To learn how you can use Excel to sort and merge assessment data, view the video tutorial below.