Tuesday, June 25, 2013

International Studies of Teacher Evaluation: Student Tests Seldom Cited, Portfolios Carry More Weight

Big News: U.S. student test performance emphasis in teacher evaluations is a global anomaly. This is a devastating retort to the edureform/privatizing industry, their media supporters, their political supporters and acquiescing teacher trade unions.

A number of comparative internationally studies have been available in the last five years, delving into the different ways that other countries handle teacher evaluation. Diverse qualitative measures abound in these surveys; student test scores are rarely used for teacher advancement or termination conditions.

ch (International comparisons references: Laura Figazzalo, "THE USE AND MISUSE OF TEACHER APPRAISAL: An overview of cases in the developed world," Education International [the international federation of teachers' unions], 2013; Marlène Isoré, Teacher Evaluation: Current Practices in OECD Countries and a Literature Review, OECD Working Paper, 2009; Tim Walker, "How Do High-Performing Nations Evaluate Teachers?," NEA Today, March 25, 2013. Link for Isoré is difficult. You may access the initial page [http://www.oecd-ilibrary.org/education/teacher-evaluation-current-practices-in-oecd-countries-and-a-literature-review_223283631428] and the pdf link: http://www.oecd-ilibrary.org/docserver/download/5ksf76jc5phd.pdf?expires=1372295146&id=id&accname=guest&checksum=4C3FAA149C84B4E2B2FC0000C187934F)

The UFT, the AFT and the larger ramifications of these studies
These studies --and the conclusion that high-performing countries do perform well in spite of no test-score link-in on teacher promotion or firing-- indicate that this country's teacher unions, by and large have not performed their job in looking beyond our borders at other countries at how they evaluate teachers. For all the current talk of international benchmarks and keeping up with the Finnish or Singaporeans, facts and logic indicate that unions have done a terrible job of researching or of doing some serious self-analysis or comparison. Worse, in the face of all the teacher bashing Michael Mulgrew's United Federation of Teachers (UFT) or Randi Weingarten's American Federation of Teachers (AFT) have not taken the opportunity to refute the line of the reformers that teachers can be held exclusively responsible for test results, they have cooperated with evaluation schemes that rest on reference to test scores. They have the resources to research these matters. The Chicago Teachers Union, to their credit, referenced Isoré's 2009 Organisation for Economic Co-operation and Development (OECD) study in one of their statements on teacher evaluation, "Teacher Evaluation or Teacher Collaboration?" Incredibly, the UFT's Mulgrew is promoting the state-imposed teacher evaluation system as a long-needed model of reform.

Major takeaways

I) High-stakes tests and student performance.
A common current among these studies: reference to test scores are made in some countries, but not the top flying countries in global comparisons. Finland has no national standardized test; and Singapore does not use test scores to measure teacher performance. The report authors cited few specific countries, China, the U.K. and the U.S. being exceptions, that use student test scores.
Indeed, Figazzolo's study indicates that the consistently top-performing nations on international tests do not evaluate teachers on the basis of test scores. High flyers, no test incorporation: The following countries are countries that are consistently higher performing in the PISA and TIMSS international comparative tests of students, yet, their teacher evaluation systems do not cite test scores: Austria, Australia, Belgium, Canada, Finland, Japan, Korea, the Netherlands, New Zealand, Slovakia, Sweden, Switzerland. (References: PISA test result rankings, and TIMSS test result rankings, Laura Figazzolo's 2013 study.)
Marlene Isoré's OECD survey provides a number of reasons why test score reliance is avoided. (See summation paragraph, no. 65, and component paragraphs, 60 to 64.)

II) Student portfolios
Student class portfolios carry greater weight than they do in the U.S. (Isoré)
It is interesting: in New York City it is the alternative school where we often find student portfolios as traditionally trumping the state-wide standardized test. In several countries in Isore's OECD student student portfolios carry greater weight in teacher assessment than in the U.S.

III) Evaluations by staff other than school principals
In many countries peers (other teachers) are evaluators. The evaluations are meant in a supportive, rather than competitive manner. (Figazzolo; Isoré)

IV) The overall climate is less hostile and more professional

The tenor in public discourse is hostile to teachers. The working conditions, worsening by the year, are making the job status less professional and marked by deskilling.

Linda Darling-Hammond has noted the greater respect the high-performing nations afford their instructors.
They enter a well-paid profession – in Singapore earning as much as beginning doctors -- where they are supported by mentor teachers and have 15 or more hours a week to work and learn together – engaging in shared planning, action research, lesson study, and observations in each other’s classrooms.
VI) More qualitative assessment

This comes with reports of greater Charlotte Danielson reach than we realized. Isoré reports that Danielson or Danielson-influenced evaluation schemes are used in Chile and in Quebec, Canada.

* * *

Laura Figazzalo, "THE USE AND MISUSE OF TEACHER APPRAISAL: An overview of cases in the developed world," Education International [the international federation of teachers' unions], 2013
A review of the technical evidence leads Baker et al (2010) and other sources (Burris, 2012; Strauss, 2012) to conclude that, although standardised test scores of students are one tool school leaders can use to make judgments about teacher effectiveness, such scores can only be a part of an overall comprehensive evaluation. Any sound evaluation has to necessarily involve a balancing of all relevant factors in order to provide a more accurate view of what teachers do in the classroom and their contribution to student learning. In addition, binding teacher evaluation and sanctions to test score results can discourage teachers from wanting to work in schools with the neediest students, while the large, unpredictable variation in the results and their perceived unfairness can undermine teacher morale (Baker et al, 2010). For instance, teachers show lower gains when they have large numbers of new English-learners and students with disabilities than when they teach other students. This is true even when statistical methods are used to “control” for student characteristics (Darling-Hammond, 2012). Surveys have found that teacher attrition and demoralisation have been associated with test-based accountability efforts, particularly in high-need schools.
The use of VAMS is also associated with a narrowing of the curriculum; a de facto curriculum whose subject matter is defined by what is tested. Teachers who rate highest on the low-level multiple-choice tests currently in use are often not those who raise higher scores in assessments of more-challenging learning (Darling-Hammond, 2012). Some believe that the pressure to teach “fill-in-the-bubble tests” will further reduce the focus on research, writing, and complex problem-solving; areas which students will need competence in to compete with their peers in high-achieving countries (Darling-Hammond, 2012). Finally, as far as merit-pay systems are concerned, tying teacher evaluation and remuneration to test results is problematic on numerous levels, not least because it reinforces a competitive spirit that undermines teacher collegiality and teamwork (Froese-Germain, 2011).


2009 OECD Working Paper, Teacher Evaluation: Current Practices in OECD Countries and a Literature Review
By Marlène Isoré
65. As a consequence, despite the attractiveness of the idea, there are numerous caveats against the use of student scores to evaluate teachers. In particular, there is a wide consensus in the literature around two specific directions: student outcomes should not be used as the sole measurement of teacher performance, and student outcomes should not be naively used for career decisions concerning the teacher, including the link to pay, because this incorporates a substantial risk to punish or reward teachers for results beyond their control (Kane and Staiger, 2002; Kupermintz, 2002; McCaffrey et al., 2003; CAESL, 2004; Raudenbush, 2004; Braun, 2005; Ingvarson, Kleinhenz and Wilkinson, 2007; Rowley and Ingvarson, 2007). These rejections from teachers and scholars have materialized, for instance, in the New York State’s legislature decision to ban the use of test scores in evaluating teachers in April 2008.

* * * * *

DELVING FURTHER INTO REASONS WHY USING STUDENT TEST SCORE ACHIEVEMENT IS A DUBIOUS MEASURES:
60. Student learning outcomes is an appealing measure to assess teaching performance, since the ultimate goal of teaching is to improve student learning. Not surprisingly, much research has focused on the use of student achievement as measured by standardised tests to evaluate teachers. For instance, Leigh (2007) recently examined the test scores in literacy and numeracy of three cohorts of students, and concluded that the changes in the relative positions of classes of students provided a basis for the identification of effective and ineffective teachers. Braun (2005) argues that considering student scores is a promising approach for two reasons: first, it moves the discussion about teacher quality towards student learning as the primary goal of teaching, and second, it introduces a quantitative – and thus, objective and fair – measurement of teacher performance. In this respect, the development of “value-added” models represents significant progress relative to methods based on the absolute proportion of students meeting a given achievement level. “Value-added” models are designed to control for the individual students’ previous test scores, and therefore have the potential to identify the contribution an individual teacher made to students’ achievement.

61. In Florida, the “Special Teachers are Rewarded” (STAR) scheme links salary or bonus awards for individual teachers to value-added measures of st udent learning (Ingvarson, Kleinhenz and Wilkinson, 2007). Nevertheless, this type of link between a direct measure of performance and pay remains extremely rare, given the numerous statistical and theoretical challenges associated with the use of these methods. Indeed, Braun (2005) emphasises the marked contrast between the enthusiasm of those who would like to use such measurements, mainly policymakers, and the reservations expressed by the researchers who have studied their technical characteristics.

62. Using student achievement on standardised tests to evaluate teacher performance presents numerous statistical challenges. Most authors (L ockwood, Louis and McCaffrey, 2002; Kupermintz, 2003; Braun, 2005; Aaronson, Barrow and Sander, 2007; Goe, 2007) are not convinced that the current generation of value-added models is sufficiently valid and reliable to be used for fairly evaluating individual teachers’ effectiveness. Statistical limitations first refer to the noticeable lack of reliable data, mainly due to the fact that individual students rarely take annual standardised tests. Rowley and Ingvarson (2007) criticise Leigh (2007)’s methodology, which consists of creating a hypothetic test score in the missing data year at the midpoint of two available test results, arguing that it does not allow to fairly attribute the students’ success to the different teachers involved. Second, when data are available, sampling variations can cause imprecision in test score measures; this problem is particularly striking in elementary EDU/WKP(2009)2 schools, where the limited number of students per classroom creates large idiosyncrasies of the particular sample of students being tested (Kane and Staiger, 2002).

63. Broader methodological criticisms stress that value-added models, whatever their degree of sophistication, can neither fully integrate all factors influencing student achievement scores – qualitative by nature – nor reflect all student learning outcomes. Family background and support, school attendance, peer and classroom climate, school policies, availability of adequate materials, and children effects influence student learning (CAESL, 2004; Ingvarson, Kleinhenz and Wilkinson, 2007; Goe, 2007, Weingarten, 2007). Specific factors at the time of the test – “a dog barking in the playground, a severe flu season, a disruptive student in a class” – can also affect one student’s results independently from his teacher’s contribution (Kane and Staiger, 2002). Moreover, good teachers are likely to have an impact on children’s achievement during several years after having taught to them; and conversely, after several years of ineffective teachers, students may never be able to catch up academically. These teacher ‘cumulative effects’ cannot be accurately measured at discrete points in time (Hanushek, 1986; Sanders and Rivers, 1996; CAESL, 2004). Finally, teaching impact on students is not restricted to areas assessed through student standardised tests, – generally limited to reading and numeracy –, but also include transfer of psychological, civic and lifelong learning skills (Margo et al., 2008). While Xin, Xu and Tatsuoka (2004) tried to decompose single test scores into several categories of cognitive abilities in four countries (Japan, Korea, the Netherlands and the United States), they found that teachers’ attributes used in pay decisions have no consistent positive impact on any type of cognitive skills, despite their attention to controlling for individual and family background. These are sources of skepticism for using such statistical methods.

64. Theoretical limitations also need to be considered. First, a statistical correlation is not a causal relationship: the fact that teachers matter for student learning does not necessarily indicate that student learning is the result of good teaching. Second, the standardised tests used to differentiate students are not specifically designed for the purpose of assessing teachers. Following Popham (1997), Goe (2007) argues that they were not engineered to be particularly sensitive to small variations in instruction or to sort out teacher contributions to student learning. Thus they do not provide a solid basis on which to hold teachers accountable for their performance. Third, using student tests scores to evaluate teachers may induce unexpected distortions and constrictions in teacher behaviour towards the sole achievement on standardised tests. High-stakes incentive schemes based on standardised tests can incite teachers to concentrate exclusively on teaching areas assessed in the tests – therefore reducing the curriculum to the basic skills generally tested – (Jacob and Lefgren, 2005, Weingarten, 2007), incite teachers to concentrate on the specific students who are close to passing mark at the expense of children who are behind or ahead (Weingarten, 2007), and even provoke serious cases of teacher cheating on standardised tests (Jacob and Levitt, 2003; Jacob, 2005). Furthermore, test results may identify teachers who are ineffective or should professionally develop but do neither permit to fairly discriminate between the wide range of effective teachers nor identify which professional development activities should be established in order to improve their performance (Braun, 2005). Finally, it may lead to holding teachers responsible for the whole student performance whereas one should instead recognise that successful teaching is a shared responsibility between governments, schools and the teaching profession (Ingvarson et al., 2007).
* * * * *
84. Measuring the effect of teacher evaluation faces a number of challenges. First, it needs to control for the broad set of qualitative variables which are likely to influence student learning. These variables encompass teacher characteristics (e.g. age, gender), teacher education and experience, students’ family factors (e.g. parents’ background, parents’ support), school factors (e.g. school policies, school incentives, peer and classroom effects) and student factors (e.g. motivation, cognitive abilities, cumulative experience). The complex realities of education prevent researchers from accurately assimilating these factors as traditional inputs into production functions (Hanushek, 1986). Second, because of its qualitative and heterogeneous nature, the output itself – student learning – is not a traditionally measurable ‘end product’, and this makes the decomposition between different factor contributions even more difficult (Hanushek, 1986; Ingvarson et al., 2007). This does not mean that doing any quantitative study in education is vain but rather than it requires particular attention to analytical issues or potential misinterpretations of the results. A particular focus should be placed on the fact that each factor omission or measurement problem – including lack of data – creates a potential quantitative bias in the estimated relationship between teacher quality and student achievement (Xin, Xu and Tatsuoka, 2004).

85. As a consequence, the empirical literature that primarily indicated that teacher evaluation may have an important role in student learning came from a process of elimination. By contradicting or restricting the respective roles of individual teachers’ apparent features (whether characteristics, education, experience, or financial incentives), numerous studies concluded that it was teacher practices – and, by extrapolation, evaluation of these practices – that indeed matter. The first influential contribution was Hanushek’s distinction between observable aspects of teachers, such as teacher background, gender, or race, and teachers’ unquantifiable “skills” (Hanushek, 1986, 1992). According to Hanushek, if the previous literature has found no significant impact of teacher quality on student achievement, it was because it concentrated on observable attributes of teachers – teacher’s holding of a master degree for example – while teacher quality was instead related to their “skills” or “idiosyncratic choices of teaching and methods” (such as classroom management, methods of presenting abstract ideas, communication skills, and so forth), i.e. their practice.
* * * * *
A UNITED STATES EVALUATION SAMPLE, ALONG WITH SOME GLOBAL EXAMPLES OF TEACHER EVALUATION:

ANNEX 2: EXAMPLES OF TEACHER EVALUATION SYSTEMS IN OECD COUNTRIES
1. Teacher evaluation for summative purposes with links to pay: The US District of Cincinnati [Milanowski, 2004]
Context: Cincinnati is a large urban district with 48,000 students and 3,000 teachers in more than 70 schools and programmes. Its average level of student achievement is low compared to the surrounding suburban districts. Cincinnati has also had a history of school reform activity, including the introduction of new whole-school designs, school-based budgeting, and teams to run schools and deliver instruction. The union-management relationship has generally been positive. Like many other urban districts, state accountability programmes and public expectations have put pressure on the district to raise student outcomes.

Implementation: In response to the obsolescence of the existing teacher performance evaluation system, and ambitious goals for improving student achievement, the District designed a knowledge- and skill-based pay system and a new teacher evaluation system during the 1998-1999 school year. The assessment system was piloted in the 1999-2000 school year and is used for teacher evaluation district wide since the 2000-01 school year.

Criteria: The assessment system is based on a set of teaching standards derived from the Framework for Teaching (Danielson, 1996). Seventeen performance standards are grouped into four domains: (i) planning and preparation; (ii) creating an environment for learning; (iii) teaching for learning; and (iv) professionalism. For each standard, a set of behaviourally anchored rating scales called rubrics describe four levels of performance: unsatisfactory, basic, proficient, and distinguished. Instruments: Teachers are evaluated using the rubrics based on two major sources of evidence: six classroom observations and a portfolio prepared by the teacher. The portfolio includes artifacts such as lesson and unit plans, attendance records, student work, family contact logs, and documentation of professional development activities.

Evaluators: Four classroom observations are made by a teacher evaluator hired from the ranks of the teaching force and released from classroom teaching for three years. Principals and assistant principals do the other two observations.

Aggregation of scores: Based on summaries of the six observations, teacher evaluators make a final summative rating on each of the standards in domains (ii) and (iii), whereas principals and assistant principals rate teachers on the standards in domains (i) and (v), primarily based on the teacher portfolio. Standards-level ratings are then aggregated to a domain-level score for each of the four domains. Scope and frequency of the evaluation: The full assessment system is used for a comprehensive evaluation of teachers in their first and third years and every five years thereafter. A less intensive assessment is done in all other years, conducted only by principals and assistant principals and based on more limited evidence. The annual assessment is intended to be both an opportunity for teacher professional development and an evaluation for accountability purposes. EDU/WKP(2009)2

Training on the evaluation process: Both teachers and evaluators receive considerable training on the new system. Evaluators are trained using a calibration process that involves rating taped lessons using the rubrics and then comparing ratings with expert judges and discussing differences. To ensure consistency among evaluators, the district eventually requires that all evaluators, including principals, meet a standard of agreement with a set of references or expert evaluators in rating videotaped lessons. Since the 2001-02 school year, only those who meet the standards are allowed to evaluate. Direct consequences: For beginning teachers (those evaluated in their first and third years), the consequence of a poor comprehensive evaluation could be the termination of the contract. For tenured teachers, consequences of a positive evaluation could include eligibility to become a lead teacher. A poor evaluation could lead to placement in the peer assistance programme and to the eventual termination of the contract.

Link to pay: The performance evaluation system was designed in part to provide the foundation for the knowledge- and skill-based pay system. This system defines career levels for teachers with pay differentiated by level. The new pay system was originally scheduled to come into effect in the 2002-03 school year, resulting in relatively high stakes evaluations for the district’s teachers. However, the link between the evaluation system and pay was rejected by teachers in a special election held in May 2002.

2. Teacher evaluation for formative purposes and as part of broader school policies
2a. Finland [UNESCO, 2007]
Context: In Finland, school teachers have positions comparable to national or municipal public servants. However, school leaders are in charge of teacher selection – once the required license is obtained – and in charge of all the policies that are considered as necessary to the enhancement of teaching quality, among which teacher evaluation. Finland is a paradigmatic case where the former system of ‘teachers and schools inspection and supervision’ was removed in 1990 but not replaced by another similar external system. As a consequence, teacher evaluation currently goes hand in hand with other policies within each particular school.

Methods / Evaluators: The Finnish scheme of teacher evaluation is characterised by the very high level of confidence placed in school and teacher competencies and professionalism as a basis to improve teaching quality. Thus, teacher self-evaluation is considered as a prime means of professional optimisation. School leaders also have a crucial role in engaging teachers in self-reflection about their own practice, and in developing a culture of evaluation alongside ambitious goals, according to the school context and challenges. The majority of schools have implemented annual discussions between school leaders and teachers to evaluate the fulfillment of the personal objectives set up during the previous year and to establish further personal objectives that correspond both to the analysis of the teacher and the needs of the school.
2b. England [Ofsted, 2006; TDA, 2007]
Context: The English system was originally designed with summative purposes, aiming at evaluating teachers’ performance, and providing them with opportunities to access a higher career stage and the corresponding pay scale. However, numerous concerns about the fairness of the process and the potential perverse impacts of the procedure on teacher performance itself were addressed (Kleinhenz and Ingvarson, 2004). Hence, the recent developments of the system – including new professional standards from September 2007 – indicate an increased formative approach, embodied by a willingness to reinforce the link between the teacher appraisal system and teacher professional development needs relative to the EDU/WKP(2009)2 school goals. More generally, the system, completed within a wider framework for the whole school workforce, aims to improve school leadership and to be an integral part of the school’s broader policies. Scope/Methods: The evaluation is differentiated according to the career stage of the teacher being evaluated. Five professional stages are identified: (i) the award of the Qualified Teacher Status (Q); (ii) teachers on the main scale (Core) (C); (iii) teachers on the upper pay scale (Post Threshold Teachers) (P); (iv) Excellent Teachers (E); and (v) Advanced skills Teachers (A).

Criteria: At each stage, teaching professional standards encompass three domains. The first one refers to the teacher’s professional attributes, including relationships with children and young people; attitude vis-à- vis the framework and the implementation of new school policies; communicating and working with others; and professional development activities. The second domain is composed of the teacher’s professional knowledge and understanding, including knowledge on teaching and learning; understanding of assessing and monitoring; subjects and curriculum knowledge; literacy, numeracy and ICT skills; understanding the factors affecting the achievement of diversified student groups; and knowledge on student health and well-being. The last domain refers to the teacher’s professional skills, including planning, teaching, assessing, monitoring, giving feedback competencies; ability to review and adapt teaching and learning; ability to create a learning environment; capacities to develop team working and collaboration. All of these standards are statements of good teaching which do not replace the professional duties and responsibilities of teachers.

Consequences on teacher professional growth and links to school expectations and policies: The standards support teachers in identifying their professional development needs. Where teachers wish to progress to the next career stage, the next level of the framework provides a reference point for all teachers when considering future development. Whilst not all teachers necessarily want to move to the next career stage, the standards also support teachers in identifying ways to broaden and deepen their expertise within their current career stages. These frameworks are a basis for professional responsibility and contractual engagement to engage all teachers in effective, sustained and relevant professional development throughout their careers. They provide a continuum of expectations about the level of engagement in professional development that provides clarity and appropriate differentiation for each career stage. They also set expectations about the contribution teachers make to others, taking account of their levels of skills, expertise and experience, their role within the school, and reflecting on their use of up-to-date subject knowledge and pedagogy. In all these cases, performance management is the key process that provides the context for regular discussions about teachers’ career aspirations and their future development, within or beyond their current career stage.

For further information:
• Training and Development Agency for Schools (TDA): http://www.tda.gov.uk/teachers/professionalstandards.aspx and http://www.tda.gov.uk/teachers/continuingprofessionaldevelopment.aspx
• Office for Standards in Education (Ofsted): http://www.ofsted.gov.uk
3. Conciliating the summative and formative purposes in a comprehensive approach: Chile [Avalos and Assael, 2006]
Context: The historical context of the Chilean educational system has doubtlessly played a critical role in understanding the necessity for a comprehensive and conciliating teacher evaluation scheme. In 1980, the military government [1973-1990] transferred the management of schools to the municipal authorities, which also implied a change of status of teachers from public servants to salaries employees of EDU/WKP(2009)2
municipalities. At the end of the dictatorial regime, a major concern was that teachers’ conditions did not evolve in line with those for public servants, which had an enormous impact on how teachers perceived and valued themselves, as well as on public opinion. In the 1990s the teaching profession suffered from a dramatic deterioration of the quality of applicants to teaching and from worsened working conditions. At the same time, evidence of unsatisfactory student learning results put a strong pressure on the government to include a clause in the new Teacher Statute (1991) that required a yearly evaluation of teachers. But while teachers continued to make their case for improved salaries and working conditions, they rejected the implementation of the evaluation system. This was followed by a long period of discussions and negotiations on the teacher evaluation model to be implemented.
Design and implementation of the system: The system was enacted by law in August 2004, that is, some seven years after the initial discussions. The system is directed toward the improvement of teaching and learning outcomes. It is designed to stimulate teachers to further their own improvement through the learning of their strengths and weaknesses. It is based on explicit criteria of what is evaluated, but without prescribing a model of teaching. It rests on the articulation of its different elements: criteria sanctioned by the teaching workforce, an independent management structure, especially prepared evaluators, and a coordinated set of procedures to gather the evidence required by the criteria.
Key actors in the system: The Centre for In-service Training located in the Ministry of Education (Centro de Perfeccionamiento, Experimentación e Investigación Pedagógica) manages the system. A consultative committee composed of academics and representatives from the Teachers’ Union, the Chilean Association of Municipalities and the Ministry of Education, monitors and provides advice on the process. A university centre is contracted to implement the process: production and revision of instruments, selection and preparation of evaluators and scorers, and analysis of evidence gathered from each evaluation process. The application process itself is decentralised so that in every district there is a committee that is directly responsible for organising the evaluation procedures. The evidence gathered is processed at the district level and sent to the central processing unit at the university, together with contextual information that can help interpret results. This central form of processing the evidence follows a request by teachers with the purpose of greater objectiveness.

Criteria: The Ministry of Education took the lead in defining the assessment criteria, producing a set of standards based on the work done earlier for the initial teacher education standards and on Danielson’s Framework for Teaching. The result is a framework for competent teaching formulated in four teaching domains (planning, learning environment, professionalism and teaching strategies for the learning of all students) and twenty criteria/standards. The framework was the subject of wide consultations among teachers until an agreement was reached. The criteria are linked to four levels of quality/performance: ‘unsatisfactory’, ‘basic’, ‘competent’ and ‘excellent’.

Instruments: The evidence used to evaluate the teachers, structured around the Framework, includes four sources: (i) a portfolio with samples of teachers’ work and a video of one of their lessons; (ii) a structured self-evaluation form; (iii) a structured interview with a peer evaluator; and (iv) a report from the school management and pedagogic authorities. The evaluation takes place every four years. Training of evaluators: The peer evaluators are specifically prepared for their task and must pass a test to be accredited. Although they should be familiar with the context in which the evaluated teacher is based (e.g. socio-economic and working conditions) they may not be teachers in the same school.
Consequences of the evaluation: One of the main challenges that needed to be addressed during the negotiation process referred to the potential implications for the individual teacher evaluated. It was agreed that teachers rated as being at a ‘basic’ level are provided with specific professional development opportunities in order to improve. Teachers rated as performing ‘unsatisfactorily’ are also provided with EDU/WKP(2009)2
professional development opportunities, but are evaluated again one year later; if the teacher fails to perform satisfactorily in two consecutive evaluations, he or she is dismissed. By contrast, teachers assessed as ‘competent’ or ‘exceptionally competent’ are given priority in promotion opportunities and in professional development activities of their interest. They may also apply for a salary bonus provided that they take a test on curricular and pedagogical knowledge. The system has both summative and formative elements instead of being primarily dedicated to one of the purposes, which is the result of the negotiation process which had taken the multiple stakeholders’ interests into account. For instance, the summative elements neither include a link between teacher’s performance and student results (something the union strongly opposed) nor a link to the career ladder. The link to professional development is emphasised and differentiated on the basis of the teacher’s level of performance.
For further information: Chile’s laws on the Teaching Statute: Ley N°3.500; Ley N°19.070; Ley N° 19.933; Ley N° 19.961.

4. Teacher evaluation stemming from bureaucratic procedures: France [Haut Conseil de l’évaluation de l’école, 2003; Pochard, 2008] Context: French teachers are classified in three distinct categories according to their education and initial certification: primary education teachers (professeurs des écoles), secondary education teachers with a regular certification (enseignants certifiés), and secondary education teachers with a higher level of certification (enseignants agrégés). All teachers are public servants but are placed in one of these three career tracks. These differ in terms of conditions and hours of work, administrative pay scale, and teaching practice (multitask primary education teachers vs. subject-specialised secondary education teachers). France does not generally suffer from teacher shortages and examinations to enter the profession continue to be selective. However, France has concerns regarding the societal status of teaching, and the skills necessary to respond to school needs. The current teacher evaluation system is often described as ‘not very fair’, ‘not very efficient’, and ‘generating malaise and sometimes suffering’ for both evaluated teachers and evaluators, because it is based on administrative procedures rather than a comprehensive scheme with a clear improvement purpose.
Periodicity of evaluation/evaluators: Teacher evaluation is supposed to be undertaken on a regular basis, as an integral part of the work and duties of the teacher. Primary education teachers are evaluated by a teaching inspector (inspecteur), while secondary level teachers are evaluated by a panel composed of an inspector – who defines 60 % of the final score – and the school principal – responsible for the other 40 %. However, the intended frequent evaluations often fall short of expectations. First, the frequency of evaluations is not legally fixed, and is arbitrarily determined by the inspectors’ availability. This is a cause for concern regarding the fairness of the system – because teachers working under the same rules receive feedback at diverse intervals – as well as regarding its efficacy – the average interval between two evaluations being 3-4 years in primary education and 6-7 years in secondary education, deemed much too long. Moreover, the workload is such that concerns might be raised regarding the value of the feedback. An inspector takes responsibility for between 350 and 400 teachers, which is excessive for the feedback to be effective in improving teachers’ practices. As a consequence, the inspectors themselves report malaise and frustration associated with the evaluation process, mainly because they feel that they have little impact on teaching practices and cannot develop their competences and skills for teaching enhancement. Their role is sometimes de facto restricted to control the abuses within the profession.
Instruments: Evidence on the teacher’s practice is gathered through the observation of a teaching session, followed by an interview with the teacher. Criticisms of this approach include: (i) the fact that a single classroom observation might not be enough to forge a fair and accurate view of the teacher’s abilities and knowledge; and (ii) in the interview teachers focus on reacting to the inspectors’ criticisms instead of EDU/WKP(2009)2 discussing their particular needs for improvement. The whole procedure does not seem to give much room for self-evaluation and teachers’ reflection on their own practice and performance.
Criteria: Both ‘pedagogical’ and ‘administrative’ aspects are observed and rated but with no reference to a framework which defines what ‘good’ teaching is. Concerns are numerous. The nature of the different ‘pedagogical’ skills assessed, as well as their weight in the overall appreciation of the teacher, remains largely at the discretion of each inspector. This reinforces subjective appraisals, unpredictable and random results, at the expense of fairness and accuracy in the process. Teachers report not knowing how and on what criteria they are evaluated. The most objective and understood criteria used to evaluate teachers are the ‘administrative’ ones such as punctuality and attendance. As a result, the rating obtained by a teacher often remains primarily determined by their certification rating (i.e. result of entrance examination). Consequences: The consequences of the teacher’s evaluation on the career are limited, except in cases of serious misconduct. Teachers’ salaries are determined by a single salary scale in which progression depends on years of service and the initial qualifications and entrance examination. Commitment to work is rarely recognised and valued, as well as merit, outstanding performance, or initiatives seeking to improve student learning. In addition, there is no link to professional development activities, the latter being very limited and disconnected from teachers’ identified weaknesses. The evaluation process does not provide opportunities for self-reflection on teaching practices or for peer mutual learning, and entails little advice and coaching.

For further information:
• Haut Conseil de l’évaluation de l’école (2003): http://cisad.adc.education.fr/hcee/ documents/rapport_annuel_2003.pdf
• Rapport des inspections générales: http://lesrapports.ladocumentationfrancaise.fr/BRP/054004446/0000.pdf
• Ministry of Education: http://www.education.gouv.fr/cid263/l-evaluation-des-personnels.html
About other teacher evaluation systems:
• The Canadian Province of Alberta: http://www.education.alberta.ca/department/policy/k12manual/section2/teacher.aspx • The US State of Iowa: http://www.iowa.gov/educate/content/view/1450/1617

* * * * *

REFERENCES
Aaronson, D.; Barrow, L. and Sander, W. (2007) “Teachers and Student Achievement in the Chicago Public High Schools”, Journal of Labor Economics, Vol. 25, No. 1, pp 95-135.
American Federation of Teachers (2001) “Beginning Teacher Induction: The Essential Bridge”, Educational Issues Policy Brief No. 13, AFT, 2001.
American Federation of Teachers and National Education Association (2008) A Guide to Understanding National Board Certification, AFT and NEA, Washington, DC.
Anderson, L. and Pellicer, L. (2001) Teacher Peer Assistance and Review, Corwin Press.
Avalos, B. and Assael, J. (2006) “Moving from resistance to agreement: The case of the Chilean teacher performance evaluation”, International Journal of Educational Research, Vol. 45, No. 4-5, pp 254-266.
Beck, R.; Livne, N. and Bear, S. (2005) “Teachers’ self-assessment of the effects of formative and summative electronic portfolios on professional development”, European Journal of Teacher Education, Vol. 28, No. 3, pp 221-244.
Bolino, M. and Turnley, W. (2003) “Counternormative impression management, likeability, and performance ratings: the use of intimidation in an organizational setting”, Journal of Organizational Behavior, Vol. 24, No. 2, pp 237-250.
Bond, L.; Smith, T.; Baker, W. and Hattie, J. (2000) “The Certification System of the National Board for Professional Teaching Standards: A Construct and Consequential Validity Study”, NBPTS, 2000.
Borman, G. and Kimball, S. (2005) “Teacher Quality and Educational Equality: Do Teachers with Higher Standards-Based Evaluation Ratings Close Student Achievement Gaps?”, The Elementary School Journal, Vol. 106, No. 1, pp 3-20.
Braun, H. (2005) “Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models”, Educational Testing Service (ETS), 2005.
Casson, H. Jr (2007) “Reducing Teacher Moral Hazard in the U.S. Elementary and Secondary Educational System through Merit-pay: An Application of the Principal – Agency Theory”, Forum for Social Economics, Vol. 36, No. 2, pp 87-95.
Campbell, D.; Melenzyer, B.; Nettles, D. and Wyman, R. (2000) Portfolio and performance assessment in teacher education, Needham Heights, MA, Allyn & Bacon.
Cavalluzzo, L. (2004) “Is National Board Certification An Effective Signal of Teacher Quality?”, The CNA Corporation, Alexandria, Virginia, 2004. EDU/WKP(2009)2
 40
Center for Assessment and Evaluation of Student Learning (2004) “Using Student Tests to Measure Teacher Quality”, CAESL Assessment Brief No. 9.
Center for Teaching Quality (2008) “Measuring What Matters: The Effects of National Board Certification on Advancing 21st Century Teaching and Learning”, CTQ, 2008.
Cohen, C. and King Rice, J. (2005) “National Board Certification as Professional Development: Design and Cost”, NBPTS, 2005.
Corcoran, T. (2007) “Teaching Matters: How State and Local Policymakers Can Improve the Quality of Teachers and Teaching”, Consortium for Policy Research in Education (CPRE) Policy Briefs RB-48.
Danielson, C. (2001) “New Trends in Teacher Evaluation”, Educational Leadership, Vol. 58, No. 5, pp 12-15.
Danielson, C. (1996, 2007) Enhancing Professional Practice: A Framework for Teaching, 1st and 2nd editions, Association for Supervision and Curriculum Development (ASCD), Alexandria, Virginia.
Danielson, C. and McGreal, T. (2000) Teacher Evaluation to Enhance Professional Practice, Association for Supervision and Curriculum Development (ASCD), Alexandria, Virginia.
Darling, L. (2001) “Portfolio as practice: the narratives of emerging teachers”, Teaching and Teacher Education, Vol. 17, pp 107-121.
Darling-Hammond, L.; Pecheone, R. and Stansbury, K. (2004) “Beginning Teacher Quality: What Matters for Student Learning?”, Research proposal from Standford University to the Carnegie Corporation of New York, available at www.pacttpa.org/_files/Publications_and_Presentations/Carnegie_grant_proposal.doc.
Day, C. and Gu, Q. (2007) “Variations in the conditions for teachers’ professional learning and development: sustaining commitment and effectiveness over a career”, Oxford Review of Education, Vol. 33, No. 4, pp 423-443.
Department of Education, Science and Training (2007) “Performance-based rewards for teachers”, DEST Research Papers, 2007.
Elmore, R. (2000) Building a New Structure for School Leadership, Albert Shanker Institute, Winter 2000.
Figlio, D. and Kenny, L. (2007), “Individual teacher incentives and student performance”, Journal of Public Economics, Vol. 91, No. 5-6, pp 901-914.
Freund, M.; Kane Russell, V. and Kavulic, C. (2005) “A Study of the Role of Mentoring in Achieving Certification by the National Board for Professional Teaching Standards”, NBPTS, 2005.
Goe, L. (2007) “The Link Between Teacher Quality and Student Outcomes: A Research Synthesis”, National Comprehensive Center for Teacher Quality, 2007.
Goldhaber, D. and Anthony, E. (2007) “Can Teacher Quality Be Effectively Assessed? National Board Certification As a Signal of Effective Teaching”, The Review of Economics and Statistics, Vol. 89, No. 1, pp 134-150. EDU/WKP(2009)2  41
Greenfield, W. (1995) “Toward a Theory of School Administration: the Centrality of Leadership”, Educational Administration Quraterly, Vol. 31, No. 1, pp 61-85.
Halverson, R.; Kelley, C. and Kimball, S (2004) “Implementing Teacher Evaluation Systems: How Principals Make Sense of Complex Artifacts to Shape Local Instructional Practice” in Educational Administration, Policy and Reform: Research and Measurement Research and Theory in Educational Administration, Vol. 3, W.K. Hoy and C.G. Miskel (Eds.) Greenwish, CT: Information Age Press.
Hanushek, E. (2004) “Does School Accountability Lead to Improved Student Performance?”, NBER Working Papers n°10591.
Hanushek, E. (1992) “The Trade-Off between Child Quantity and Quality”, Journal of Political Economy, Vol. 100, No. 1, pp 84-117.
Hanushek, E. (1986) “The Economics of Schooling: Production and Efficiency in Public Schools”, Journal of Economic Literature, Vol. 24, No. 3, pp 1141-1177.
Hanushek, E.; Kain, J.; O’Brien, D. and Rivkin, S. (2005) “The Market for Teacher Quality”, NBER Working Papers n°11154.
Harris, D. and Sass, T. (2007) “The Effects of NBPTS-Certified Teachers on Student Achievement”, Center for Analysis of Longitudinal Data in Education Research (CALDER), Working Paper No. 4.
Haut Conseil de l’évaluation de l’école (2003) Rapport annuel, HCéé, 2003.
Heneman, H. and Milanowski, A. (2003) “Continuing Assessment of Teacher Reactions to a StandardsBased Teacher Evaluation System”, Journal of Personnel evaluation in Education, Vol. 17, No. 2, pp 173-195.
Heneman, H.; Milanowski, A. and Kimball, S. (2007) “Teacher Performance Pay: Synthesis of Plans, Research, and Guidelines for Practice”, Consortium for Policy Research in Education (CPRE) Policy Briefs RB-46.
Heneman, H.; Milanowski, A.; Kimball, S. and Odden, A. (2006) “Standards-Based Teacher Evaluation as a Foundation for Knowledge- and Skill-Based Pay”, Consortium for Policy Research in Education (CPRE) Policy Briefs RB-45.
Hess, F. and West, M. (2006) “A Better Bargain: Overhauling Teacher Collective Bargaining for the 21st Century”, Cambridge, MA: Program on Education Policy and Governance, Harvard University.
Holland, P. (2005) “The Case for Expanding Standards for Teacher Evaluation to Include an Instructional Supervision Perspective”, Journal of Personnel Evaluation in Education, Vol. 18, No. 1, pp 67-77.
Ingvarson, L.; Kleinhenz, E. and Wilkinson, J. (2007) Research on Performance Pay for Teachers, Australian Council for Educational Research (ACER), 2007.
Interstate New Teacher Assessment and Support Consortium (1992) “Model Standards for Beginning
Teacher Licensing, Assessment and Development: A Resource for State Dialogue”, INTASC,
Council of Chief State School Officers (CCSSO), 1992.EDU/WKP(2009)2  42
Jacob, B. (2004) “Accountability, Incentives and Behavior: The Impact of High-Stakes Testing in the
Chicago Public Schools”, Journal of Public Economics, Vol. 89, No. 5-6, pp 761-796.
Jacob, B. and Lefgren, L. (2008) “Can Principals Identify Effective Teachers? Evidence on Subjective Performance Evaluation in Education”, Journal of Labor Economics, Vol. 26, No. 1, pp 101-136.
Jacob, B. and Lefgren, L. (2005b) “What Do Parents Value in Education: an Empirical Investigation of Parents’ Revealed Preferences for Teachers”, NBER Working Paper n°11494.
Jacob, B. and Lefgren, L. (2005a) “Principals as Agents: Subjective Performance Measurement in Education”, NBER Working Papers n°11463.
Jacob, B. and Levitt, S. (2003) “Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating”, The Quaterly Journal of Economics, Vol. 118, No. 3, pp 843-878.
Jacobs, C.; Martin, S. and Otieno, T. (2008) “A Science Lesson Plan Analysis Instrument for Formative and Summative Program Evaluation of a Teacher Education Program”, Science Teacher Education (Articles online in advance of print).
Jun, M.-K.; Anthony, R.; Achrazoglu, J. and Coghill-Behrends, W. (2007) “Using ePortfolio for the Assessment and Professional Development of Newly Hired Teachers”, TechTrends, Vol. 51, No. 4,
pp 45-50.
Kane, T. and Staiger, D. (2002) “Volatility in School Test Scores: Implications for Test-Based Accountability Systems”, Brooking Papers on Education Policy, Washington, DC.
Kennedy, M. (2005) Inside Teaching, Harvard University Press, London, England, 2005.
Kimball, S. (2002) “Analysis of Feedback, Enabling Conditions and Fairness Perceptions of Teachers in Three School Districts with New Standards-Based Evaluation Systems”, Journal of Personnel Evaluation in Education, Vol. 16, No. 4, pp 241-268.
Kimball, S.; Milanowski, T. and McKinney, S. (2007) “Implementation of Standards-Based Principal Evaluation in One School District: First Year Results From Randomized Trial”, Paper presented at the annual conference of the American Educational Research Association, available at http://cpre.wceruw.org/publications/KimballMilanowskiMcKinney.pdf.
Klecker, B. (2000) “Content validity of preservice teacher portfolios in a standards-based program”, Journal of Instructional Psychology, Vol. 27, No. 1, pp 35-38.
Kleinhenz, E. and Ingvarson, L. (2004) “Teacher Evaluation Uncoupled: A Discussion of Teacher Evaluation Policies and Practices in Australian States and Their Relation to Quality Teaching and Learning”, Research Papers in Education, Vol. 19, No. 1, pp 31-49.
Leigh, A. (2007) “Estimating Teacher Effectiveness From Two-Year Changes in Students’ Test Scores”, Research School of Social Sciences, Australian National University, available at http://econrsss.anu.edu.au/~aleigh/pdf/TQPanel.pdf.
Levin, J. (2003) “Relational incentive contracts”, American Economic Review, Vol. 93, No. 3, pp 835–57.
Lustick, D. and Sykes, G. (2006) “National Board Certification as Professional Development: What are Teachers Learning?”, Education Policy Analysis Archives, Vol. 14, No.5. EDU/WKP(2009)2  43
MacLeod, B. (2003) “Optimal contracting with subjective evaluation”, American Economic Review 93, No. 1, pp 216-240.
Mansvelder-Longayroux, D.; Beijaard, D. and Verloop, N. (2007) “The portfolio as a tool for stimulating reflection by student teachers”, Teaching and Teacher Education, Vol. 23, No. 1, pp 47-62.
Margo, J.; Benton, M.; Withers, K. and Sodha, S. (2008) Those Who Can?, Institute for Public Policy Research (IPPR) Publications, 2008.
Marshall, K. (2005) “It’s Time to Rethink Teacher Supervision and Evaluation”, Phi Delta Kappan, Vol.
86, No. 10, pp 727-735.
McColskey, W. and Stronge, J. (2005) “A Comparison of National Board Certified Teachers and nonNational Board Certified Teachers: Is there a difference in teacher effectiveness and student
achievement”, NBPTS, 2005.
Milanowski, A. (2007) “Performance Pay System Preferences of Students Preparing to Be Teachers”, American Education Finance Association, 2007.
Milanowski, A. (2004) “The Relationship Between Teacher Performance Evaluation Scores and Student Achievement: Evidence From Cincinatti”, Peabody Journal of Education, Vol. 79, No. 4, pp 33-53.
Milanowski, A. and Heneman, H. (2001) “Assessment of Teacher Reactions to a Standards-Based Teacher Evaluation System: A Pilot Study”, Journal of Personnel Evaluation in Education, Vol. 15, No. 3, pp 193-212.
Milanowski, A. and Kimball, S. (2003) “The Framework-Based Teacher Performance Assessment Systems in Cincinnati and Washoe”, CPRE Working Paper Series TC-03-07.
Ministerial Council on Education, Employment Training and Youth Affairs (2003) “A National Framework for Professional Standards for Teaching”, MCEETYA, Carlton South, Australia, 2003.
Mizala, A. and Romaguera, P. (2004) “School and teacher performance incentives: The Latin American experience”, International Journal of Educational Development, Vol. 24, No. 6, pp 739-754.
Muñoz, M. and Chang, F. (2007) “The Elusive Relationship Between Teacher Characteristics and Student Achievement Growth: A Longitudinal Multilevel Model for Change”, Journal of Personnel Evaluation in Education, Vol. 20, No. 3-4, pp 147-164.
Nabors Oláh, L.; Lawrence, N. and Riggan, M. (2008) “Learning to learn from benchmark assessment data: How teachers analyze results”, Paper presented at the Annual Meeting of the American Educational Research Association, New York, 2008, available at http://www.cpre.org/images/stories/cpre_pdfs/ aera2008_olah_lawrence_riggan.pdf .
National Board for Professional Teaching Standards (2007) “A Research Guide on National Board Certification of Teachers”, NBPTS, Arlington, VA, 2007.
Odden, A. and Kelley, C. (2002) Paying Teachers for What They Know and Do: New and Smarter Compensation Strategies to Improve Schools, Corwin Press, Thousand Oaks, California, 2002.
Office for Standards in Education (2006) “The logical chain: continuing professional development in effective schools”, OFSTED Publications n°2639, United Kingdom, 2006. EDU/WKP(2009)2  44
Organisation for Economic Co-Operation and Development (2008) Improving School Leadership, OECD, Paris, 2008.
Organisation for Economic Co-Operation and Development (2005) Teachers Matter: Attracting, Developing and Retaining Effective Teachers, OECD, Paris, 2005.
Ovando, M. and Ramirez, A Jr (2007) “Principals’ Instructional Leadership Within a Teacher Performance Appraisal System: Enhancing Students’ Academic Success”, Journal of Personnel Evaluation in Education, Vol. 20, No. 1-2, pp 85-110.
Pecheone, R. and Chung, R. (2006) “Evidence in Teacher Education: The Performance Assessment for California Teachers (PACT)”, Journal of Teacher Education, Vol. 57, No. 1, pp 22-36.
Peterson, K. (2000) Teacher Evaluation: A Comprehensive Guide to New Directions and Practices, 2nd edition, Thousand Oaks, CA: Corwin Press.
Peterson, K.; Wahlquist, C. and Bone, K. (2000) “Student Surveys for Teacher Evaluation”, Journal of Personnel Evaluation in Education, Vol. 14, No. 2, pp 135-153.
Peterson, K.; Wahlquist, C.; Esparza Brown, J. and Mukhopadhyay, S. (2003) “Parents Surveys for Teacher Evaluation”, Journal of Personnel Evaluation in Education, Vol. 17, No. 4, pp 317-330.
Peterson, K. (2006) “Teacher Pay Reform Challenges States”, Stateline.org: where policy and politics news click, available at http://www.stateline.org/live/ViewPage.action?siteNodeId= 136&languageId=1&contentId=93346.
Petty, T. (2002) “Identifying the Wants and Needs of North Carolina High School Mathematics Teachers for Job Success and Satisfaction”, NBPTS, 2002.
Ping Yan Chow, A.; King Por Wong, E.; Seeshing Yeung, A. and Wan Mo, K (2002) “Teachers’ Perceptions of Appraiser-Appraisee Relationships”, Journal of Personnel Evaluation in Education, Vol. 16, No. 2, pp 85-101.
Pochard, M. (2008) Livre vert sur l’évolution du métier d’enseignant, Rapport au ministre de l’Education nationale, La Documentation française, Collection des rapports officiels, 2008.
Popham, J. (1997) “Consequential validity: Right Concern – Wrong Concept”, Educational Measurement: Issues and Practice, Vol. 16, No. 2, pp 9-13.
Robinson, V. (2007) “School Leadership and Student Outcomes : Identifying What works and Why”, Australian Council for Educational Leaders, ACEL Monograph Series No. 41.
Sanders, W.; Ashton, J. and Wright, P. (2005) “Comparison of the Effects of NBPTS Certified Teachers with Other Teachers on the Rate of Student Academic Progress”, NBPTS, 2005.
Smith, T.; Gordon, B.; Colby, S. and Wang, J. (2005) “An Examination of the Relationship Between Depth of Student Learning and National Board Certification Status”, Office for Research on Teaching, Appalachian State University.
Stronge, J. and Tucker, P. (2003) Handbook on Teacher Evaluation: Assessing and Improving Performance, Eye On Education Publications, 2003. EDU/WKP(2009)2  45
Stronge, J.; Ward, T.; Tucker, P. and Hindman, J. (2007) “What is the Relationship Between Teacher Quality and Student Achievement? An Exploratory Study”, Journal of Personnel Evaluation in Education, Vol. 20, No. 3-4, pp 165-184.
Strudler, N. and Wertzel, K. (2008) “Costs and Benefits of Electronic Portfiolos in Teacher Education: Faculty Perspectives”, Journal of Computing in Teacher Education, Vol. 24, No. 4, pp 135-142.
Training and Development Agency for Schools (2007a) “Models Performance Management Policy for Schools”, TDA, United Kingdom, 2007.
Training and Development Agency for Schools (2007b) “Professional Standards for Teachers: Why Sit Still in Your Career?”, TDA, United Kingdom, 2007.
Tucker, P.; Stronge, J. and Gareis, C. (2002) Handbook on teacher portfolios for evaluation and professional development, Larchmont, NY, Eye on Education.
UNESCO (2007) Evaluación del Desempeño y Carrera Profesional Docente: Una panorámica de América y Europa, Oficina Regional de Educación para américa Latina y el Caribe, UNESCO Santiago, 2007.
Vandervoort, L.; Amrein-Beardsley, A. and Berliner, D. (2004) “National Board Certified Teachers and Their Students’ Achievement”, Education Policy Analysis Archives, Vol. 12, No. 46.
Weingarten, R. (2007) “Using Student Test Scores to Evaluate Teachers: Common Sense or Nonsense?”, United Federation of Teachers, available at http://www.uft.org/news/randi/ny_times/ UFT_WMM_Mar07_v32.pdf.
Wertzel, K. and Strudler, N. (2006) “Costs and Benefits of Electronic Portfolios in Teacher Education: Student Voices”, Journal of Computing in Teacher Education, Vol. 22, No. 3, pp 69-78.
Xin, T.; Xu, Z. and Tatsuoka, K. (2004) “Linkage Between Teacher Quality, Student Achievement, and Cognitive Skills: A Rule-Space Model”, Studies in Educational Evaluation, Vol. 30, pp 205-223.
* * *
Figazzolo's critique of value-added modeling and the use of standardized tests:
Value-added evaluation methods, where teacher effectiveness and compensation are increasingly being tied to student scores on standardised tests, have raised concerns among teachers, unions and practitioners in general. As Froese-Germain (2011) puts it, these methods originate from a highly charged climate of data-driven accountability, and are increasingly common across the US. For example, the Los Angeles Unified School District is among a growing number of US school districts using the results of standardised tests to determine the “value-added” outcomes produced by the teacher (the value-added measure of teacher performance is related to gains in test scores in the teacher’s class over time)5. Other stories are reported from Chicago, where some reformers, such as Chicago Mayor Rahm Emanuel, want as much as half of a teacher’s evaluation to be linked to student test scores6 (Strauss, 2012).
Rather than placing student results in context, these methods issue a comprehensive judgment purely based on data developed through standardised calculations. However, as Baker et al (2010) highlight, VAM estimates have proven to be unstable across statistical models, years, and teaching classes. Studies quoted in Baker et al (2010) prove in fact that a teacher who appears to be ineffective in one year may achieve dramatically different results the following year. VAM’s instability can result from differences in the characteristics of students assigned to given teachers in a particular year and from specific evaluation measures. Such factors include: small samples of students (made even less representative in schools serving disadvantaged students and which have high rates of student mobility), other influences on student learning both inside and outside school, and tests which are poorly lined up with the curriculum teachers are expected to cover, or which do not measure the full range of achievement of students in the class.
A number of non-teacher factors have been found to have strong influences on student learning gains. These include the influence of other teachers, tutors or instructional specialists; school conditions — such as the quality of curriculum materials, specialist or tutoring supports, class size, resources, learning environment; and other factors that affect learning.
A review of the technical evidence leads Baker et al (2010) and other sources (Burris, 2012; Strauss, 2012) to conclude that, although standardised test scores of students are one tool school leaders can use to make judgments about teacher effectiveness, such scores can only be a part of an overall comprehensive evaluation. Any sound evaluation has to necessarily involve a balancing of all relevant factors in order to provide a more accurate view of what teachers do in the classroom and their contribution to student learning.
In addition, binding teacher evaluation and sanctions to test score results can discourage teachers from wanting to work in schools with the neediest students, while the large, unpredictable variation in the results and their perceived unfairness can undermine teacher morale (Baker et al, 2010). For instance, teachers show lower gains when they have large numbers of new English-learners and students with disabilities than when they teach other students. This is true even when statistical methods are used to “control” for student characteristics (Darling-Hammond, 2012). Surveys have found that teacher attrition and demoralisation have been associated with test-based accountability efforts, particularly in high-need schools.
The use of VAMS is also associated with a narrowing of the curriculum; a de facto curriculum whose subject matter is defined by what is tested. Teachers who rate highest on the low-level multiple-choice tests currently in use are often not those who raise higher scores in assessments of more-challenging learning (Darling-Hammond, 2012). Some believe that the pressure to teach “fill-in-the-bubble tests” will further reduce the focus on research, writing, and complex problem-solving; areas which students will need competence in to compete with their peers in high-achieving countries (Darling-Hammond, 2012).
Finally, as far as merit-pay systems are concerned, tying teacher evaluation and remuneration to test results is problematic on numerous levels, not least because it reinforces a competitive spirit that undermines teacher collegiality and teamwork (Froese-Germain, 2011).
POSTSCRIPT: On a related note to the last point, one Nashville, Tennessee, U.S. study has concluded that merit pay alone does not raise student performance: Melanie Moran, "Vanderbilt News," September 21, 2010, "Teacher performance pay alone does not raise student test scores."