The project AKTeur aims at establishing an interdisciplinary cooperation network and a broad technological basis for automatic coding of free text formats in different scenarios.

AKTeur aims at creating an interdisciplinary co-operation network and a broad technological basis for automated coding of free text responses in various settings.
In the scope of the project two exemplary cases of competence diagnostics are conducted. Among others, automatic language processing procedures are used. Often closed answer formats, e.g. multiple-choice, are used in competence diagnostics to enable automated analysis. This implies the disadvantage that a loss of information already in the process of categorizing is inevitable. On the contrary, free text formats allow for an elaborate and content –valid measurement of competencies as well as the rating of several facets of a writing product. Moreover, automated coding of open answers can offer graded evaluation (partial credit) and confidence factor, respectively, weight of individual assessments.


Main focus of AKTeur is the creation of a broad technological basis for the automated coding of free text answers in various settings. To achieve this goal, an interdisciplinary co-operation network has been founded, consisting of psychologists, educationalists and computer scientists. Exemplarily, we are focusing on two application settings from competence diagnostics:

    • Automatic coding of quality dimensions of a free text in research on learning to write (‘multidimensional rating’)
    • Automatic coding of short free answer formats in psychological diagnostics of capability characteristics (‘unidimensional rating’)


The automatic systems to be evolved are based on natural language processing (NLP) procedures.For the evaluation of the findings congruencies of the code, the system output and the human coder, are compared to the results of a pair of human coders. Thereby it can be asserted whether the error ratio of the system can be reduced to an inconsistence between two human raters.The developed system is explored by diagnostic research questions in regard to systematic sources of error, e.g. concerning items, answer formats as well as groups of test takers. This also contains explaining ‘inter-task-variation’ in the congruence between human and automatic assessment. Moreover, the comparability of coding factors is tested at the scale level by means of structural and measurement model invariance.


Funding: DIPF

Cooperation: Prof. Dr. Gabriele Faust (Universität Bamberg), Prof. Dr. Benno Stein (Bauhaus-Universität Weimar), Prof. Dr. Iryna Gurevych (Technische Universität Darmstadt), Prof. Dr. Manfred Prenzel (TUM School of Education)

Duration: 04/2013 – 12/2014

Project manager: Iryna Gurevych, Frank Goldhammer

Contact: Iryna Gurevych, Frank Goldhammer