Methodological design

.

This section is structured as follows:

.

WHAT DOES IT MEAN? 

.

What is the purpose?

To set up the method that will allow the external evaluation team to answer the questions and to come to an overall assessment. In addition to selected questions, judgement criteria, indicators and targets, the evaluation method includes:

  • a strategy for collecting and analysing data
  • selected investigation areas
  • a series of specifically designed tools
  • a work plan.

When does it take place?

The evaluation team starts designing a provisional method as early as from the drafting of its proposal in order to draw up cost estimates. A key assumption at this stage is the extent to which the evaluation will rely on secondary data or will involve specific data collection work in the field.

The main frame of the method is then established during the inception stage, in line with the evaluation questions, judgement criteria, indicators, data collection tools and analysis strategy.

The method is refined and finalised before the field phase and fully described in the first phase report (desk).
The final report includes a short and sharp presentation of the evaluation method, together with its limitations, if there are any. The method is fully described in annex, including initial design, problems encountered, solutions found, method actually implemented, and limitations.
The evaluation method is designed through an iterative process at three levels:

  • A question-by-question approach, allowing the evaluation team to prepare the design tables with an aim to adequately answer each question.
  • An overall approach which cuts across the questions, and which allows the evaluation team to optimise the evaluation method as a whole, whilst matching time and resource constraints.
  • A series of specifically developed tools.

.

DESIGN TABLE PER QUESTION

.

What does this mean?

The design table explains how an evaluation question will be answered, including the chain of reasoning which connects data, findings and conclusions.
An example of a design table is provided on this site.

When is it constructed?

A design table is developed for each question and progressively refined in successive versions:

  • Preliminary version appended to the inception report.
  • Draft version(s) prepared during the first phase of the evaluation as the methodological design is progressively optimised.
  • Final version attached to the first phase report (desk).

Opening the table

The first lines of the table summarise the steps which have already been taken (see from question to indicator), i.e.

  • Text of the question
  • Comment
  • Scope of the question
  • Judgement criterion (or criteria)
  • Indicators
  • Targets

Sub-questions

What does this mean?

Together with the judgement criteria, indicators and targets, the sub-questions are meant to explain how the evaluation team will answer the question, for instance:

  • How have indicators changed over the evaluated period?
  • To what extent can the change be qualified as a success?
  • How far have EC activities contributed to explaining the observed change? Through which mechanisms? Is there evidence that such mechanisms have been working as assumed?
  • How far does evidence support alternative explanations? Is there evidence that the observed change is owing to other development partners, other EC policies, the Government or other external factors?

Who does what, and when?

A set of evaluation questions is drawn up at the inception stage, together with sub-questions.
The responsibilities are as follows:

  • The reference group validates the questions.
  • The external evaluation team breaks down each question into sub-questions in a design table.

What is the purpose?

The sub-questions describe the chain of reasoning through which the evaluation team plans to answer the question. The intended reasoning is indicative but it is worth clarifying it in advance because:

  • The reference group members get an opportunity to provide specific advice and inputs
  • All evaluation team members understand why they are collecting and analysing data, and therefore work more effectively
  • The team members who are not familiar with evaluation receive useful guidance on which data are to be collected and analysed.

It may furthermore be worth suggesting provisional sub-questions together with the draft set of key questions. Members of the reference group will then realise that many of their expectations will be satisfied through answering the sub-questions. It will help them to accept the principle of a limited list of well focused evaluation questions.

Sub-questions pertaining to indicators

These sub-questions may pertain to:
(1) the current level / status of the indicators, possibly with a break-down per country, area, social group, etc., for instance:

  • What is the current value of quantitative indicator X at country level, and for targeted areas/groups?

(2) changes in the indicators, for instance:

  • Do stakeholders perceive a positive change in qualitative indicator Y over the evaluated time period?

As seen in the examples above, the sub-questions may be quantitative or qualitative.

Question:

  • To what extent has EC support improved the capacity of the educational system to enrol pupils from disadvantaged groups without discrimination?

Sub-question:

  • What is the change in the number of experienced and qualified teachers per 1000 primary school age children respectively at country level, in poor urban areas and in areas where ethnic minority X concentrates?

In this example, the sub-question is simply meant to show how the indicator (number of experienced and qualified teachers) will be applied.

Sub-questions pertaining to analysis

These sub-questions are written with a view to:
(3) Confirming assumptions about the success of the intervention and substantiating a positive answer to the question, for instance:

  • Has the design of EC support included a commitment to monitor performance indicators related to effect X?
  • Was such monitoring actually undertaken?
  • Was the monitoring data subject to periodic discussion among development partners?
  • Have partners taken action as a follow-up to such discussions?
  • Were such actions designed with a view to achieving effect X?
  • Etc.

(4) Challenging assumptions about the success of the intervention and substantiating a negative answer to the question, for instance:

  • Have other development partners pushed for monitoring performance indicators related to effect X?
  • Have non-state actors contributed to putting the issue of achieving effect X onto the political agenda?
  • How far did other partners contribute towards shaping the actions taken in favour of disadvantaged groups?

Question:

  • To what extent has EC support improved the capacity of local authorities to design adequate rural development strategies?

Sub-question:

  • To what extent has the EC been able to put decentralisation at the top of the agenda in its policy dialogue with the Government?

In this example the sub-question relates to a specific short-term result (promotion of decentralisation through policy dialogue) which is a driver to the wider impact highlighted in the question (capacity building).

Sub-questions pertaining to judgement

These sub-questions are meant to assist in the formulation of conclusions involving explicit value judgements. They are written with a view to:
(5) applying and possibly refining the judgement criteria in the specific context of the intervention, for instance:

  • Do stakeholders spontaneously focus on the same judgement criteria as those selected for the evaluation? If not, why not?

(6) applying or developing the targets in the specific context of the intervention, for instance:

  • Which are the areas / groups in the country with the best performance as regards the selected judgement criterion? Among them, which ones can legitimately be compared with targeted areas / groups?

Question:

  • To what extent has EC support been efficient in strengthening the road network?

Sub-question:

  • To what extent has the EC strengthened the road network? In this example, the sub-question relates to a prerequisite (the road network is strengthened) before applying the judgement criterion (efficiency in strengthening the road network).

Analysis strategy

Four strategies can be considered:

  • Change analysis, which compares measured / qualified indicators over time, and against targets
  • Meta-analysis, which extrapolates upon findings of other evaluations and studies, after having carefully checked their validity and transferability
  • Attribution analysis, which compares the observed changes and a "without intervention" scenario, also called counterfactual
  • Contribution analysis, which confirms or disconfirms cause-and-effect assumptions on the basis of a chain of reasoning.

The three last strategies are appropriate for the questions which require a cause-and-effect analysis. The first one is appropriate in other instances.

Investigation areas

The evaluation team may consider collecting and analysing data at the level of the intervention as a whole, or investigating some areas more specifically, for instance:

  • All sub-questions will be addressed through an investigation into a selection of countries and will include country notes and a visit to each country.
  • In addition to using national statistics, the evaluation team investigates a selection of districts respectively typical of (a) the targeted group /area, and (b) best performing groups / areas.
  • The evaluation team will carefully select ten projects which will be subject to an in-depth investigation in order to address some of the sub-questions.

Example

 

 

.

OPTIMISING THE OVERALL DESIGN

.

What is the purpose?

To finalise the overall evaluation method in a way which cuts across the evaluation questions and which makes a good enough mix of evaluation tools, considering the available time and resources.

When does it take place?

The evaluation method is designed through an iterative process at three levels:

  • A question-by-question approach, allowing the evaluation team to prepare the design tables with the aim to adequately answer each question.
  • An overall approach which cuts across the questions, and which allows the evaluation team to optimise the evaluation method as a whole, whilst matching time and resource constraints.
  • A series of specifically developed tools.

Several iterations may be needed in order to allow the evaluation team to optimise the overall design whilst ensuring a high quality to each question.

Selecting tools

In parallel with the design tables, which are established on a question-by-question basis, the evaluation team designs its overall evaluation method which covers all questions and includes a number of tools such as:

  • A documentary analysis applying to a given set of reports
  • An analysis of context indicators
  • A series of interviews with stakeholders belonging to a given category
  • A questionnaire
  • A series of case studies
  • A series of focus groups or participatory meetings
  • An expert panel
  • The building and analysis of an ad hoc data base

These examples are all but limitative. The evaluation team may even have to develop a completely new tool, where relevant.

Combining tools and questions

The evaluation team draws up the list of all evaluation tools suggested in the design tables. Each tool is then considered from the viewpoint of its capacity to help answering several questions and sub-questions, for instance:

  • A series of interviews with Government officials: this tool may help answering questions related to "alignment of EC strategy on Government priorities", "policy dialogue in major policy domains", and "co-ordination among development partners".
  • A focus group gathering participants interested in primary education: this tool may help answering questions related to "efficiency of selected funding modalities", "mainstreaming of gender equality", and "improved access to basic education for the poor".

An image of this process is given by the matrix below in which the first tool helps answering sub-questions Aa, Cf and Eb, etc.

This image suggests that the set of tools will provide the evaluation team with possibilities of cross-checking data on sub-questions Aa and Eb. Tools and questions are combined in order to optimise such possibilities of cross-checking.
The above diagram is for explanatory purpose only. In the real world, the use of a matrix may be unpractical if the number of tools and/or sub-questions exceeds a dozen, which is typically the case in a country strategy evaluation. A pragmatic approach consists of drawing up a list of all questions and sub-questions addressed when developing each tool. An example is provided on this site.

Verifying the adequacy of tools

The evaluation team confirms that each tool envisaged is adequate in the sense that:

  • It fulfils the required function, e.g. a focus group is adequate for collecting beneficiaries' perceptions; a case study is adequate for understanding cause-and-effect mechanisms; a series of interviews is adequate for collecting the views of officials in partner institutions, and so on.
  • It is in line with the selected analysis strategy, e.g. a series of case studies is adequate in the instance of a contribution analysis; a questionnaire to beneficiaries and non-beneficiaries is adequate in the instance of an attribution analysis; a panel of experts is adequate in the instance of a meta-analysis, and so on.
  • It fits into the identified constraints, e.g. a questionnaire survey is appropriate if, and only if a satisfactory sample can be drawn; a database can be analysed if, and only if missing data are in limited numbers.

The evaluation team explains and justifies its main technical choices, with alternative options, pros and cons, and associated risks.

Articulating tools

The evaluation team assesses whether the selected tools may reinforce one another, for instance:

  • A series of interviews may help in identifying relevant papers which will be submitted to a documentary analysis.
  • A series of interviews may help in selecting the stakeholders to be invited to a focus group, and the issues to be discussed.
  • A series of interviews may help in refining the questions to be put to a sample of beneficiaries.
  • A series of case study monographs may be reviewed by a panel of experts with a view to strengthening the analysis and to deriving more accurate findings.

Preparing the overall assessment

In the process of progressively optimising its design, the evaluation team examines all the design tables in a cross-cutting manner with a view to preparing its final synthesis, i.e. an overall assessment that draws upon the answers to all evaluation questions.
More than being a mere compilation of findings and conclusions, the overall assessment may involve specific analyses which deserve to be designed in advance, for instance:

  • Specific themes like policy dialogue with the partner country's Government, decentralisation of EC procedures, etc. This calls for adequate sub-questions to be inserted in several design tables.
  • If the selected questions, sub-questions and tools leave a significant part of the support unexplored, then a few additional studies may be planned in order to provide a broader picture of the overall support (e.g. meta-analysis of monitoring reports). An overall analysis of management databases may also be undertaken with the same purpose.

Allocating resources

In the process of designing its method, the evaluation team tries to adequately share its limited resources between questions and sub-questions. Some questions deserve to be addressed with costly tools such as questionnaire surveys of end users, several case studies, focus groups, etc. Other questions should rather be answered on the basis of a documentary analysis only and a few interviews with EC and Government officials.
One reason why a question should be allocated more resources is the need for credibility. If a question is considered to be politically sensitive and if the answer is likely to trigger debate, then it deserves a stronger design, possibly with more cross-checking of sources, larger samples, several focus groups or case studies, etc.
It is also wise to invest substantial resources in a question that raises feasibility problems. The following are examples of specific difficulties calling for a particularly strong design:

  • A question relates to an innovative aspect of the intervention in an area where expertise is scarce.
  • A question relates to a component of the intervention which has just reached the end users in the field, and the implementing actors have no idea on how the targeted people are reacting.
  • A question relates to a far-reaching impact that is logically distant from EC activities and the abundance of external factors makes it difficult to assess cause-and-effect relations.

A question is rarely unevaluable in the absolute. It is more likely to be an accumulation of difficulties and constraints that leads to feasibility problems. At the earliest stages of the evaluation, it is often possible to amend a difficult question so as to make it more evaluable. This can be done, for example, by limiting the scope of the question or choosing to apply it to a less distant effect or to a probable effect if the real effect is not yet observable. Once a difficult question has been validated, the evaluation team has to design an appropriate method, and to allocate adequate resources.

Cost and time constraints

Successive versions of the method are designed within the evaluation team until the following constraints are matched:

  • The implementation of the evaluation tools fit into the overall time schedule of the evaluation process
  • The cost of the evaluation tools (human resources, technical costs, travel and daily subsistence) fits into the overall budget of the evaluation
  • The availability of qualified workforce in the field is sufficient for implementing evaluation tools professionally

.

DEVELOPING A TOOL

.

No evaluation method without tools

Evaluation tools are needed in order to collect primary and secondary data, to analyse data, and to formulate value judgements (or reasoned assessments). The external evaluation team may carry out some of these tasks without evaluation tools, but several tools are always needed for answering each evaluation question.
Tools range from simple and usual ones like database extracts, documentary analysis, interviews or field visits, to more technical ones like focus groups, modelling, or cost benefit analysis. This site describes a series of tools that are frequently used.
By using an appropriate mix of tools, the evaluation team strengthens the evidence basis of the evaluation, the reliability of data, the validity of its reasoning, and the credibility of its conclusions.

How are tools chosen?

Tools are chosen during the iterative process of designing the overall evaluation method, with an aim to:

  • Contribute to answering all questions and sub-questions, and to formulating an overall assessment
  • Gather / collect data, assist in analyses and formulation of value judgement (or reasoned assessment)
  • Facilitate cross-checking and triangulation
  • Reinforce each other through appropriate combinations
  • Match contextual constraints like: availability of expertise and data, allocated resources, time schedule.

When is it developed?

All evaluation tools are developed progressively and finalised before the beginning of the field phase, although some tools need to be implemented, and therefore developed earlier in the process, e.g. interviews with key stakeholders at the inception stage, analysis of management databases at the desk phase, etc.

Developing a tool

Whilst the set of evaluation tools is to be selected as a part of the overall evaluation design, each tool is to be developed in a separate way. An example of developed tool is provided on this site.
Development a tool may be a matter of a section in the inception or desk report. However the task may needs to be further formalised, including in the form of fully fledged terms of reference, when several members of the evaluation team work separately, for instance if the works extend to different countries, or if the tool is being sub-contracted.
Tool development proceeds through seven steps as follows:

Questions and sub-questions

The evaluation team lists the questions and sub-questions that have to be addressed by the tool. It refers to the design tables.

Technical specifications

The evaluation team develops the technical specifications of the tool through a preparatory stage. Technical specifications depend on the type of tool. They cover issues like:

  • Sampling, selection of interviews, of case studies, of documents …
  • Questions, themes of a documentary analysis, content of a monograph, fields of a database …
  • Mode of implementation
  • Duration of interview, focus group, visit, …

Caution! - When developing a questionnaire or an interview guide, the evaluation team should not proceed by copying and pasting evaluation sub-questions. If evaluation questions and sub-questions are naïvely passed to informants, there are considerable risks of biases.
Technical specifications need to be further formalised when several members of the evaluation team work separately, especially if the works extend to different countries.

Risk management

The evaluation team assesses the main risks with data collection and analysis, as well as potential biases. As far as relevant, it prepares second best solutions in case the tool cannot be applied satisfactorily. The following lines provide examples of risks associated with evaluation tools, and examples of second best solutions:

  • Considering database X, if more than 20% data are still missing after 4 weeks, then analyse the available data and ask expert Y to comment upon the validity of findings.
  • Considering the series of interviews X, if more than 30% informants cannot be reached after 3 weeks, then collect alternative "second hand" information from expert Y.
  • Considering the questionnaire X, if the number of respondents falls below 200, then gather a focus group meeting and cross-check results with that of the questionnaire survey.

This list is not limitative.

Mode of reporting

The outputs vary from one tool to another. They may take the form of tables, lists of quotations, lists of verbatims, monographs, etc. The evaluation team decides on how the outputs will be reported, for instance:

  • Full length data delivered to the evaluation team leader
  • Full length data included in a CDROM attached to the final report (in which case, some documents may need to be anonymised)
  • Tables or boxes to be inserted in the final report

The evaluation team also decides on how to report about the implementation of the tool and the associated limitations if any, e.g.

  • Note to be inserted in the methodological annex appended to the final report
  • Short methodological comment to be inserted in the final report itself.

Responsibilities

The tasks and roles are shared among the evaluation team members, e.g.

  • Who is to implement the tool?
  • Who will ensure harmonisation in case several team members implement the tool in various countries / areas?
  • Who is to decide upon second best alternatives in case of difficulties?
  • Who is responsible for delivering data?
  • Who will write the methodological comments?
  • Who is to assure quality?

Quality

Quality criteria are precisely defined. Depending on the tool they may cover issues like:

  • Coverage of themes, questions, issues
  • Accuracy
  • Identification of potential biases
  • Respect of anonymity rules or other ethical rules
  • Formal quality, language, spelling, layout

This site proposes a series of quality check-lists for frequently used tools.

Time schedule and resources

In some instances, it may be necessary to specify practicalities like:

  • date of start / end
  • human resources allocated
  • travel arrangements
  • technical expenditures

Example

Evaluation tool: "Interviews Education"
Name of
tool
Interviews with stakeholders at national level in the area of education
Questions and sub-questions addressed Question E - Education

  • Has the design of EC support to primary education included a commitment to monitor the quality of educational services supplied to disadvantaged groups?
  • Was such monitoring actually undertaken?
  • Were the monitoring data subject to periodic discussion among development partners?
  • Have partners taken action as a follow up of such discussions?
  • Were such actions designed with a view to providing disadvantaged groups with high quality primary education?
  • Have other development partners pushed for monitoring the quality of educational services provided to disadvantaged groups?
  • Have non-state actors contributed to raising the issue of disadvantaged groups on the political agenda?
  • How far did other partners contribute to shaping the actions taken in favour of disadvantaged groups?
  • Is it acceptable for stakeholders to focus on the quality of teachers and premises as a way to judge discrimination in the access to primary schools?

Question D - Policy dialogue

  • What has the EC's input into policy dialogue been in the area of primary education? How does this input compare to that of other partners?
  • Did the EC input focus on the quality of educational services supplied to disadvantaged groups?

Question X - Cross-cutting issues

  • To what extent has the EC mainstreamed gender in its support to primary education? How does it compare to other partners in this respect?
  • Did the EC input focus on the quality of educational services supplied to disadvantaged groups?
Technical specifications Ten interviews at the Ministry of Education, Primary Education Inspectorate, and at the National Teachers Union and NGOs most active in the field of primary education.
Face-to-face 40-minute interviews; contacts made by the evaluation team; EC Delegation informed; guarantee of anonymous treatment; minutes not sent to informants; informants invited to the final discussion seminar.
Risk management Some informants may express themselves in a purposely biased way. Interviews should therefore focus on facts rather than opinions.
Mode of reporting (internal to evaluation team)
  • Identification fiches for all interviews, with full details
  • Minutes including profile of interviewee, verbatims (at least one per interview and per sub-question if relevant), document provided, contact provided, comment on potential biases
  • Overall synthesis per sub-question
Mode of reporting (in the final report)
  • List of interviewees (to be inserted in the overall list of interviewees appended to the report)
  • Anonymised verbatims (in a CDROM attached to the report)
  • Methodological description of the tool with comments on biases (about 50 lines to be inserted in the methodological appendix)
Responsibilities Local education expert: contacts with informants, draft interview guide, interviews and reports
Team leader: approval of interview guide
Quality Quality criteria: coverage of sub-questions with verbatims, self-assessment of biases
Quality control by team leader
Informants may be contacted again by another team member for verification. They should be informed of that.
Time schedule Start date, interviews between …and …
Date of reporting
Resources … person x days

.

FEASABILITY (EVALUABILITY) OF A QUESTION

.

What is this?

Certain questions are easy to answer while others may raise evaluability problems. It is therefore necessary to assess the feasibility of evaluation questions from the outset.

What is the purpose?

  • To ensure that the evaluation provides reliable and credible answers to the questions asked.
  • To exclude or amend questions that are too difficult to answer.
  • To adjust the available time and other resources in case difficult questions have to be answered.

What has to be taken into account?

To establish whether a question is evaluable, we check:

  • Whether the concepts are stable (Are the main terms of the question understood by everyone in the same way?)
  • Whether explanatory assumptions can be identified (What are the external factors and the cause-and-effect assumptions?)
  • Whether access to the field and data collection entail major difficulties.

What are the most frequent limitations?

A highly innovative activity

If the question concerns an innovative instrument, activity or procedure, the following difficulties may arise:

  • It is difficult to define the terms of the question without ambiguity.
  • There is a lack of expertise to understand the cause-and-effect mechanisms.

A very recent activity

If the question concerns a recently implemented activity:

  • The activity has not yet produced observable effects
  • The informants have not yet stabilised their opinions.

Managerial weaknesses

If the question concerns activities in which there are or were managerial weaknesses, the following difficulties may be encountered:

  • The monitoring data and reports are inadequate or unreliable.
  • The managerial difficulties have generated conflicts that limit access to certain informants or cause those informants to bias their answers.

In case of a strong suspicion of illicit or illegal practices, it is preferable to postpone the evaluation question for later and to start with an audit.

A scope that is too complex

If the question concerns a multi-dimensional or multi-sectoral scope, the following difficulties may be encountered:

  • In view of the available time and budget, there are too many data to collect, informants to meet and analyses to perform, and they are too dispersed.

A far-reaching impact

If the question concerns a far-reaching impact which is connected to the evaluated activity through a long chain of causes and effects, then the following difficulties may be encountered:

  • There are so many external factors and they are so influential that it becomes impossible to analyse the contribution of the intervention.

An intervention that is too marginal

If the question concerns a very small activity compared to other neighbouring policies or to the context, the following difficulties may be encountered:

  • " The evaluated activity does not attain the critical mass that would allow an analysis of its contribution.

Recommendations

A question is rarely unevaluable in the absolute. It is more likely to be an accumulation of difficulties and constraints that makes the question difficult.
When a question is considered too difficult, it is often possible to amend it to make it more evaluable. This can be done, for example, by limiting the scope of the question or choosing to apply it to a less distant effect or to a probable effect if the real effect is not yet observable. This option may be preferable to exclusion of the question.
Policy-makers or any other actor may insist on asking a question that is clearly too difficult. In such an instance, it is useful to provide a full explanation of the difficulties and to show that evaluations have limitations and cannot answer all questions.

Author

FC
Former Capacity4dev Member
last update
7 December 2022

More actions