3  Practice testing

In this chapter you’ll learn how to use AI to test yourself. Practice testing, or retrieval practice, is one of the most effective ways to consolidate learning, and it can be applied in several formats. This chapter was written with a data skills course in mind but it applies to any type of course.

You should have a specific week/chapter/lecture of your course to work with. For PsyTeachR courses like Applied Data Skills, we’ll be using the Intended Learning Outcomes, functions used, and glossary to help instruct the AI. If you’re not working through a course, or it’s not a PsyTeachR course, it would be helpful to have a list of functions, concepts, or skills that you want to test.

It’s very important that you use Copilot through your UofG account so that the course content you share is not used to train the model. You should also ensure that you have the consent of your lecturer to do this (if you’re enrolled on Applied Data Skills, you have our consent to upload any course material to Copilot, but not to platforms like ChatGPT).

3.1 Question types

The first thing we’re going to do is set-up several different prompts to create different types of practice questions. Which of these you end up using will depend on what you’re trying to study. Different question formats test different aspects of your knowledge. Here’s what each type is for and what the AI will produce when you ask for it.

3.1.1 Questions that test recognition

3.1.1.1 Multiple choice questions

MCQs tests recognition and your ability to spot the best answer among distractors. Each item has a stem (the question) followed by four labelled options (A–D). Only one is correct. MQCs are efficient for covering a wide range of content, easy to self-check, and useful for practising discrimination between similar concepts. However, they can also encourage recognition more than recall, and can sometimes be answered by guessing or test-wise strategies.

Example:

Which of the following functions is used to create a scatterplot in R? A) geom_bar() B) geom_point() C) geom_boxplot() D) geom_histogram()

3.1.1.2 True-or-false (TOF)

TOF questions are a quick way to check factual accuracy and basic conceptual understanding and involve a single statement for you to judge as true or false. TOF questions are very quick to complete and good for simple checks of knowledge but have a high chance of guessing correctly (50%) and can oversimplify complex concepts and have limited diagnostic value.

Example:

True or false? The mean() function in R returns the median of a numeric vector.

3.1.1.3 Fill-in-the-blanks (coding)

These questions strengthen fluency with R syntax by making you supply missing functions or arguments but are easier than writing out the full code from scratch These questions involve a line of R code with a missing element. These questions support recall of key syntax and reduce cognitive load compared to writing from scratch but they can be too easy and may not transfer well to real coding tasks.

Example:

ggplot(mtcars, aes(x = wt, y = mpg)) + ____()

3.1.1.4 Fill-in-the-blanks (theory)

Similarly, for theory, FITB helps memorise key terms, definitions, or concepts without giving the full answer. These questions will be a sentence with 1–2 blanks, sometimes with hints. FIT is good for reinforcing vocabulary and key concepts and are quick to create and practise. But, there is a risk of rote memorisation without deeper understanding and answers may sometimes be ambiguous.

Example:

“A variable that can take on any value between two points is called a ______.”

3.1.2 Questions that test production

3.1.2.1 Short-answer-questions

SAQs require you to recall and explain in your own words, building deeper understanding. These are focused, open questions that should be answered in <100 words. SAQs promote active recall and deeper processing and are flexible enough to test conceptual understanding. However, they are harder to self-mark and may be more time-consuming to generate and answer.

Example:

Explain the difference between a categorical and a continuous variable.

3.1.2.2 Coding problems

Coding problems let you apply your skills to solve a real task, similar to authentic assessments. They usually involve a short programming challenge using a real dataset. Coding problems closely mimic real-world problem solving, encourage transfer of knowledge, and consolidate multiple skills at once. The higher difficulty can be discouraging for beginners and they may be harder to self-assess without feedback.

Example:

“Using the penguins dataset, create a boxplot of body mass grouped by species.”

3.1.2.3 Error mode

Error mode is debugging practice which builds resilience and problem-solving skills, and teaches you to spot common mistakes. It involves a runnable piece of R code that contains one plausible error. Error mode develops error-detection and debugging skills and mirrors real-world coding experience. However, it can be frustrating for novices and requires baseline knowledge to be effective.

Example:

ggplot(mtcars, aes(x = wt, y = mpg))
geom_point()

(Error: missing + before geom_point())

3.2 Practical tip: Built-in datasets

When practising coding questions with AI, you need a dataset you can actually run code on. A common problem with AI-generated exercises is that it invents datasets or column names that do not exist, which makes it impossible to test your solution. One way to avoid this is to rely on built-in datasets.

Built-in datasets in R are sample datasets that come pre-loaded with the software or with specific packages. They cover a variety of domains (e.g., cars, gemstones, movies, penguins) and are designed to help you practise data manipulation, analysis, and visualisation without importing external files.

You can get a full list of available datasets by running data() in the console. Base R provides some, and additional packages (e.g., tidyverse, palmerpenguins) add more. Remember that a package must be loaded before its datasets are accessible.

# see list of datasets
data()

# load tidyverse to get access to extra sets
library(tidyverse)

# see list of datasets expanded to include tidyverse ones
data()

# load in dataset to environment so it can be used
data("starwars")

3.3 Question prompt

We will now design a prompt that will set-up Copilot to give you different types of questions.

NoteActivity 1

Edit the below template to suit your needs. For example, change the role, the sources you’re going to use, and if relevant, the datasets it should draw on in constraints. You can also change the other details although most of them should work for you without any changes.

TipTemplate prompt to copy and paste into Copilot

Role: You are a strict practice-testing tutor for second year undergraduate students learning R.

Sources: I will paste Intended Learning Outcomes (ILOs), function lists, and glossary terms from my course Applied Data Skills.

Protocol:

  1. Ask one question at a time of the requested type.

  2. Do not reveal the answer until I reply.

  3. Do not give the answer away in the suggested follow-up prompts. For example, for MCQs, give suggested prompts for each answer option, not just the correct answer.

  4. After I answer, mark it, then give a short explanation (2–4 sentences).

Constraints:

  1. All questions must align to ILOs and be challenging but fair.

  2. For coding questions, avoid imaginary datasets or columns. Instead, use only use the following datasets: starwars, diamonds, penguins.

Controls I will use:

  1. type: … (see list below)

  2. calibrate: harder | easier

Question Types (with rules):

  1. MCQ (Multiple choice) – 4 options (A–D), 1 correct. Plausible distractors. After marking, explain each option briefly.

  2. TOF (True/False) – One statement; avoid trivially true/false. After marking, if false, rewrite as a correct statement.

  3. SAQ (Short answer) – Ask about one concept. I should be able to answer in <100 words. After marking, provide a 2–3 point ideal outline.

  4. FITB_code (Fill-in-the-blank: coding) – One line of R with a missing function/argument. Must run on an approved dataset. State expected output shape (e.g., “tibble: 3 × 2”).

  5. FITB_theory (Fill-in-the-blank: theory) – A statement with up to 2 blanks. Give part-of-speech hints (e.g., [noun]).

  6. CP (Coding problem) – Minimal complete example using an approved dataset. Specify columns that exist. Do not invent data.

  7. EM (Error mode) – Provide a minimal reproducible example that fails using a specified dataset. The code should include the call to load the dataset and any required packages. Include exactly one plausible beginner-level error but give no hint as to what the error is. State intended outcome. Hold back the fix until after I attempt a solution. Then reveal the correction and what it teaches.

First action: Confirm readiness and ask me to provide the sources.

3.4 Sources

Once you have set up this prompt, you can give it the sources to work from—for example, copy and paste the ILOs, a list of functions, and/or key terms you want it to quiz you on. Do not worry about formatting: just paste the text in as it is.

NoteActivity 2

Add in your sources for the content you want to study.

Caution

The suggested follow-up prompts in Copilot sometimes indicate what the correct answer is, particularly for closed-questions. I’ve not found a reliable way to stop it from doing this - let me know if you crack it.

3.5 Test yourself

You can now ask it to generate questions for you by typing mcq or coding problem. If the questions seem too easy or hard, you can adjust them by using calibrate: easier or calibrate: harder

NoteActivity 3

Ask if for one question of each type and work through them. Reflect on whether they are challenging enough, if they align with your course content, and how they might differ from practice questions you have been given by a human.

3.6 Be critical

Effective practice testing is not just about answering more questions; it is about engaging the cognitive processes that drive learning. Three ideas are central here. First, metacognition: accurate monitoring and control of learning (calibration) helps you choose the right strategies and difficulty. Second, desirable difficulties: tasks that are effortful—retrieval, generation, discrimination—strengthen long-term retention when paired with feedback. Third, self-explanation: articulating why an answer is right or wrong deepens understanding and transfer. Use AI to amplify these processes, not to bypass them. Treat every AI-generated item as an opportunity to retrieve, explain, and calibrate.

The questions AI generates can be useful, but they can also introduce illusions of fluency and miscalibration.

3.6.1 Multiple-choice / TOF / SAQ questions

  • Sometimes the answers are simply wrong. If you challenge the AI, it will usually correct itself—but it will also agree with you if you claim an answer is wrong when it is not. The risk is that you accidentally encode misinformation. Remember: AI does not “know” anything; it is a sophisticated pattern-matcher, not a reliable authority.
  • It may generate questions about functions or concepts not covered in your course, which can cause confusion and unnecessary anxiety.
  • Occasionally it poses a question with multiple correct answers without making this clear, which is frustrating.
  • It may overemphasise particular topics or functions unless you explicitly direct it to vary the focus.

3.6.2 Coding problems

  • The examples are not always reproducible. For instance, it might assume the existence of a dataset with variables called “number” and “price” but provide no such dataset, making it impossible to run the code. You can still attempt the problem, but this adds extra difficulty—especially for beginners.
  • It sometimes uses functions or approaches you have not been taught, such as defaulting to Base R instead of tidyverse.

3.6.3 Error mode

  • In the first iteration of this book, AI could not create error mode problems - the code either ran fine or the errors were so stupid and obvious it was of no educational benefit. Models have improved, and they can now generate plausible errors, which is an interesting sign of progress.
  • Even so, AI often hints heavily at the error or simply tells you the answer, which reduces the learning benefit.
  • Whilst it can now generate plausible errors, it doesn’t have the benefit of years of teaching experience. When we design error mode questions, they’re based on our knowledge of what students frequently get wrong so AI error mode isn’t always as targeted and therefore as useful as questions written by an expert educator. Maybe our jobs are safe for a little while longer.
TipKey Takeaways
  1. Regular retrieval strengthens memory more than re-reading or highlighting.

  2. Choose the right format for questions: MCQs/TOF are good for recognition and breadth but weaker for recall. SAQs/FITB ae better for active recall and vocabulary but risk of rote learning. Coding problems/Error mode are closest to authentic tasks, but require more effort and background knowledge.

  3. Paste ILOs, function lists, or glossaries into Copilot to ensure it’s specific to what you are studying. Avoid imaginary datasets—stick to built-in ones like mtcars, penguins, or iris.

  4. AI outputs can be wrong, misleading, or oddly focused. Correcting them is part of the learning process and strengthens metacognition.

  5. AI is a tool, not a teacher. It lacks expert judgement about what learners typically get wrong. Use it to supplement, not replace, structured practice and seeking help from your lectuers and tutors.