5  Code review

Written by Emily Nordmann

In this chapter you’ll learn how to use AI to perform a code review and to add comments to your code. As you’ve already hopefully learned by working through this book, you have to be critical about anything the AI produces or suggests because it has no expert knowledge, but it can be a useful tool for checking and improving your code.

DeBruine et al’s Code Check Guide details what a comprehensive code check refers to:

However, some of these steps cannot (and should not) be performed by an AI. Unless you have specific ethical approval and have included this in your data management plan, you should never upload your research data to an AI tool. This means that assessing reproducibility is difficult. The AI also doesn’t know what you intended to do, and why, and has no subject knowledge so it can’t advise on anything theoretical without you giving it that information explicitly.

Therefore, what we’ll focus on in this chapter is two components of code review: comments and refactoring your code.

5.1 Code comments

Code comments are lines or sections of text added within the code itself that are ignored by the computer when the program runs. They’re there for human readers, not machines. In R, you add comments to code by adding # to the start of the string:

# this is a comment

# compute the mean of three numbers
mean(c(1,2,3))
[1] 2

Comments are useful for several reasons:

  • Clarification: They explain what certain parts of the code do, making it easier for others (and yourself) to understand the logic and flow of the code.
  • Documentation: They provide information on how the code works or why certain decisions were made, which is helpful for future reference.
  • Debugging: Temporarily commenting out parts of code can help isolate sections that may be causing errors, without deleting the code.
  • Collaboration: In team projects, comments can be used to communicate with other developers about the status or purpose of the code.

Overall, comments are a crucial part of writing clean, maintainable, and collaborative code. They help make the code more accessible and understandable to anyone who might work on it in the future.

For transparency, Gemini and Copilot wrote the above text.

5.2 Activity 1: Add comments with AI

First we’ll use use the palmerpenguins dataset again.

library(tidyverse)
library(palmerpenguins)
data("penguins")

You can use AI tools to help add comments to your code. We’ll start with a simple prompt to see how it gets on. In your chosen platform, input the below code with the prompt:

Add comments to this code

penguins_clean <- penguins %>%
  filter(complete.cases(.)) %>%
  mutate(across(where(is.factor), as.character)) %>%
  mutate(species_sex = interaction(species, sex, sep = "_"))

Adding comments

A few things to note:

  • The AI can only tell you what the code is doing, it can’t say why you chose to do that. In this example, we’ve created a new column that combines species and sex but the comment gives us no idea what the rationale was for this. The more complex your analysis, the more crucial it becomes to explain the rationale.
  • The initial attempt at adding comments is quite verbose. In Gemini’s attempt, there’s a bit more than is really necessary. I’d argue you could delete one of the comments in each section and it would still be clear. Too many comments are a different problem to too few, but they’re still a problem.
  • Gemini has also made the strange decision to make the code itself more verbose and it has removed the piping and split it into three different calls which isn’t ideal.

5.3 Activity 2: Comment prompt engineering

Let’s see if we can amend our prompt to get the comments more aligned with what we want.

Add comments to this code. Keep the comments as concise as possible Ask me any questions about the code if you are not sure what it is doing or why. Do not change anything about the code.

Copilot produced the following (and the output from ChatGPT and Gemini was very similar), which is much better.

# Assigning the cleaned data to 'penguins_clean'
penguins_clean <- penguins %>%
  # Filtering out rows with missing values
  filter(complete.cases(.)) %>%
  # Converting all factor columns to character type
  mutate(across(where(is.factor), as.character)) %>%
  # Creating a new column 'species_sex' with combined 'species' and 'sex' information
  mutate(species_sex = interaction(species, sex, sep = "_"))

5.4 Activity 3: Review existing comments

In addition to asking AI to comment your code, you can also ask it to review comments you’ve made yourself. To see how this works with a more complex example, and as an act of masochism, I gave the AI some code I wrote for a publication. The full paper is here if you’re interested - the quant analyses ended up being punted to the online appendix because of word count.

You can load in the dataset yourself with this code:

# read in data but skip rows 2 and 3
col_names <- names(read_csv("https://osf.io/download/tf3xs/", n_max = 0))
dat_raw <- read_csv("https://osf.io/download/tf3xs/", col_names = col_names, skip = 3) 

The first section of my code involved quite a complicated and long bit of wrangling, all done in a single pipeline. The purpose of the code is to clean up data collected on the survey platform Qualtrics and recoding some of the demographic variables. This is actually a shortened version because the original hit the character limit for Copilot. I did put some effort into writing comments before publication but there are almost certainly improvements to be made.

dat <- dat_raw%>%
  filter(Progress > 94, # remove incomplete responses
         DistributionChannel != "preview") %>% # Remove Emily's preview data
  select(ResponseId, "duration" = 5, Q5:Q21) %>%
  # replace NAs with "none" for disability info
  mutate(disability_nos = replace_na(disability_nos, "None"),
         physical_chronic = replace_na(physical_chronic, "None"),
         mental_health = replace_na(mental_health, "None"),
) %>%
   # recode gender data

  mutate(gender_cleaned = case_when(Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman","female (she/her)", "F", "f", "Womxn", "Woman (tranas)") ~ "Woman",
                                    Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.") ~ "Man",
                                    Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "    
Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary",
"Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary") ~ "Non-binary",
                            TRUE ~ "Not stated")) %>%
  # select necessary columns and tidy up the names
        select(ResponseId,
             "age" = Q5,
             "gender" = Q6,
             "mature" = Q7,
             "level_study" = Q8,
             "country" = Q9,
             "subject" = Q10,
             "english_first" = Q11,
             "neurotype_open" = Q13, 
             "disability_open" = Q14,
             "why_open" = Q18,
             "how_open" = Q23,
             "advantages" = Q20,
             "disadvantages" = Q21,
             everything()) 

My initial attempt at this used the following prompt:

Review my use of comments and make improvements to the comments where necessary. Explain what you changed and why.

It’s reviewed the comments and they’re not bad edits but extremely problematically, it didn’t just change the comments, it changed the code. The original first use of select() was select(ResponseId, "duration" = 5, Q5:Q21) %>%. In this code, I selected the columns I wanted but I also renamed the 5th column. Gemini has removed this line of code it doesn’t tell me anywhere that it made this change. It’s also removed the final call to select() which I used to rearrange the columns, it’s changed some of the variable names, and it’s changed “None” to “Not disclosed”. This code no longer runs. Be very, very careful.

# Filter out incomplete responses and exclude preview data
dat_gemini <- dat_raw %>%
  filter(Progress > 94, DistributionChannel != "preview") %>%

  # Select relevant columns and rename for clarity
  select(ResponseId,
         age = Q5,
         gender = Q6,
         mature = Q7,
         level_of_study = Q8,
         country = Q9,
         subject = Q10,
         english_as_first_language = Q11,
         neurotypical = replace_na(Q13, "Not disclosed"),  # Replace NA with informative value
         disability = replace_na(Q14, "Not disclosed"),  # Replace NA with informative value
         why_participate = Q18,
         how_participate = Q23,
         participation_advantages = Q20,
         participation_disadvantages = Q21) %>%

  # Recode gender data into standardized categories
  mutate(gender_cleaned = case_when(
    Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman", "female (she/her)", "F", "f", "Womxn", "Woman (trans)") ~ "Woman",
    Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.") ~ "Man",
    Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "    Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary") ~ "Non-binary",
    TRUE ~ "Not stated"
  ))

# This line (select(everything())) is unnecessary after renaming columns

Explanation of Changes: Comments: Added comments explaining the purpose of each dplyr verb (filter, select, mutate). Clarified the purpose of filtering out responses and preview data. Explained the renaming of columns for better readability. Instead of “replace NAs with none”, used more informative text like “Not disclosed” for disability and neurotype information. Code: Removed the unnecessary select(everything()) line at the end. This line would have selected all columns again after renaming them, making it redundant. These changes improve the clarity and maintainability of the code. The comments explain the “why” behind each step, while the code itself is more concise and efficient.

5.4.1 More prompt engineering

Again I played about with the prompt to try and prevent these issues:

Review my use of comments and make improvements to the comments where necessary. Keep the comments as concise as possible. Do not change anything about the code.

Copilot and ChatGPT did much better with this. To check that the object it created hadn’t changed any of the code, I ran all.equal(dat, dat_copilot) which returns true. This is a really important check.

# Assigning the processed data to 'dat'
dat_copilot <- dat_raw %>%
  filter(Progress > 94, # Exclude responses below 95% completion
       DistributionChannel != "preview") %>% # Exclude preview data
select(ResponseId, "duration" = 5, Q5:Q21) %>%
mutate(
  disability_nos = replace_na(disability_nos, "None"), # Default 'None' for missing disability info
  physical_chronic = replace_na(physical_chronic, "None"),
  mental_health = replace_na(mental_health, "None")
) %>%
mutate(
  gender_cleaned = case_when(
    Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman","female (she/her)", "F", "f", "Womxn", "Woman (tranas)") ~ "Woman",
    Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.") ~ "Man",
    Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary") ~ "Non-binary",
    TRUE ~ "Not stated"
  ) # Standardize gender representation
) %>%
select(
  ResponseId,
  "age" = Q5, # Rename Q5 to 'age'
  "gender" = Q6, # Rename Q6 to 'gender'
  "mature" = Q7, # Rename Q7 to 'mature'
  "level_study" = Q8, # Rename Q8 to 'level_study'
  "country" = Q9, # Rename Q9 to 'country'
  "subject" = Q10, # Rename Q10 to 'subject'
  "english_first" = Q11, # Rename Q11 to 'english_first'
  "neurotype_open" = Q13, # Rename Q13 to 'neurotype_open'
  "disability_open" = Q14, # Rename Q14 to 'disability_open'
  "why_open" = Q18, # Rename Q18 to 'why_open'
  "how_open" = Q23, # Rename Q23 to 'how_open'
  "advantages" = Q20, # Rename Q20 to 'advantages'
  "disadvantages" = Q21, # Rename Q21 to 'disadvantages'
  everything() # Include all remaining columns
)

Gemini’s answer doesn’t include the full code, so it’s extremely unhelpful as you can’t copy and paste it back in to your script:

dat_gemini <- dat_raw %>%

  filter(Progress > 94, # Min. 95% completion)  # Shorten completion threshold text
  DistributionChannel != "preview") %>%  # Exclude previews

select(ResponseId, duration = 5, Q5:Q21) %>%  # Set duration & select cols

mutate(disability_nos = replace_na(disability_nos, "None"),  # Impute disability data
       physical_chronic = replace_na(physical_chronic, "None"),
       mental_health = replace_na(mental_health, "None")) %>%

mutate(gender_cleaned = case_when(  # Standardize gender
  Q6 %in% c(...) ~ "Woman",  # ... (existing logic) ...
  TRUE ~ "Not stated")) %>%

select(ResponseId,  # Select & rename cols
       age = Q5,
       gender = Q6,
       ... (existing logic) ...)  # ... (remaining column selections)

5.5 Refactor code

To refactor code refers to the process of restructuring existing code without changing its intended behavior. The goal is to make the code more maintainable, readable, and efficient, while not altering the functionality or output. The benefits of refactoring code should be that it:

  • Improves Readability: Makes the code easier to understand for anyone who reads it in the future, which includes other developers or even the original author after some time has passed.
  • Reduces Complexity: Simplifies complex code structures, breaking down large functions into smaller, more manageable ones.
  • Enhances Maintainability: With cleaner code, it becomes easier to maintain and update the software over time.
  • Facilitates Extensibility: Well-factored code can be extended with new features more easily.
  • Improves Performance: In some cases, refactoring can optimize the performance of the software by removing redundancies and improving algorithms.
  • Identifies Bugs: The process of refactoring can often reveal hidden bugs or issues in the code that can then be fixed.

With thanks to Copilot for writing the above.

5.6 Activity 3: Refactor

I’ll stick with the example of code from my publication because it’s complicated enough that there are likely improvements or alternative solutions. Using the same code chunk as above, the prompt is:

refactor this code and explain what you have changed and why. Where possible, use a tidyverse approach.

Copilot’s first attempt produces an error

dat_copilot <- dat_raw %>%
  filter(Progress > 94, DistributionChannel != "preview") %>% # Keep only complete and non-preview data
  rename(duration = Q5) %>% # Rename Q5 here to avoid repetition
  select(ResponseId, duration, Q6:Q21) %>% # Select relevant columns once
  mutate(
    across(c(disability_nos, physical_chronic, mental_health), ~replace_na(., "None")), # Use across() for NAs
    gender_cleaned = case_when(
      Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman", "female (she/her)", "F", "f", "Womxn", "Woman (tranas)") ~ "Woman",
      Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.") ~ "Man",
      Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary") ~ "Non-binary",
      TRUE ~ "Not stated"
    ) # Recode gender once
  ) %>%
  select(
    ResponseId,
    duration,
    gender_cleaned,
    across(Q7:Q21, .names = "{.col}_cleaned") # Clean up column names in one go
  )
Error in `select()`:
! Problem while evaluating `across(Q7:Q21, .names = "{.col}_cleaned")`.
Caused by error in `across()`:
! Must only be used inside data-masking verbs like `mutate()`,
  `filter()`, and `group_by()`.

Gemini did the thing again where it doesn’t type out the full code so you can’t just run it to check:

dat %>%
  # Filter incomplete & preview data
  filter(Progress > 94, DistributionChannel != "preview") %>%
  # Select & rename columns
  select(ResponseId,
         duration = 5,
         age = Q5,
         gender = Q6,
         mature = Q7,
         level_study = Q8,
         country = Q9,
         subject = Q10,
         english_first = Q11,
         # Impute missing disability data
         disability_nos = replace_na(disability_nos, "None"),
         physical_chronic = replace_na(physical_chronic, "None"),
         mental_health = replace_na(mental_health, "None"),
         neurotype_open = Q13,
         disability_open = Q14,
         why_open = Q18,
         how_open = Q23,
         advantages = Q20,
         disadvantages = Q21) %>%
  # Recode gender
  mutate(gender_cleaned = case_when(
    Q6 %in% c(...) ~ "Woman",  # ... (existing logic) ...
    TRUE ~ "Not stated"))

# Removed unnecessary select(everything())

ChatGPT 4o introduces an error in that it’s changed "duration" = 5 to duration = Q5. The original code was selecting the column by index number (i.e., the 5th column), this isn’t the name of a column and so it doesn’t run (although if you fix this it does then run and all.equal() returns true.

dat_chat <- dat_raw %>%
  # Remove incomplete responses and preview data
  filter(Progress > 94, DistributionChannel != "preview") %>%
  
  # Select necessary columns and rename them
  select(ResponseId, duration = Q5, Q5:Q21) %>%
  
  # Replace NAs with "None" for disability info
  mutate(across(c(disability_nos, physical_chronic, mental_health), ~ replace_na(.x, "None"))) %>%
  
  # Recode gender data
  mutate(gender_cleaned = case_when(
    Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman", "female (she/her)", "F", "f", "Womxn", "Woman (tranas)") ~ "Woman",
    Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.") ~ "Man",
    Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary") ~ "Non-binary",
    TRUE ~ "Not stated"
  )) %>%
  
  # Select and rename necessary columns
  select(
    ResponseId,
    age = Q5,
    gender = Q6,
    mature = Q7,
    level_study = Q8,
    country = Q9,
    subject = Q10,
    english_first = Q11,
    neurotype_open = Q13,
    disability_open = Q14,
    why_open = Q18,
    how_open = Q23,
    advantages = Q20,
    disadvantages = Q21,
    everything()
  )
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `Q5` doesn't exist.

Because you have to check everything so carefully, I wondered if it might be easier to give it less code at any one time so I asked Copilot to split the code in each pipeline into a different object which it did:

Split the code in each pipeline into a separate object

# Filter complete and non-preview data
filtered_data <- dat_raw %>%
  filter(Progress > 94,
         DistributionChannel != "preview")

# Select and rename columns
selected_data <- filtered_data %>%
  select(ResponseId, "duration" = 5, Q5:Q21)

# Replace NAs with "None" for disability info
na_replaced_data <- selected_data %>%
  mutate(
    disability_nos = replace_na(disability_nos, "None"),
    physical_chronic = replace_na(physical_chronic, "None"),
    mental_health = replace_na(mental_health, "None")
  )

# Recode gender data
gender_recoded_data <- na_replaced_data %>%
  mutate(
    gender_cleaned = case_when(
      Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman", "female (she/her)", "F", "f", "Womxn", "Woman (tranas)") ~ "Woman",
      Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.") ~ "Man",
      Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary") ~ "Non-binary",
      TRUE ~ "Not stated"
    )
  )

# Final selection and renaming of columns
final_data <- gender_recoded_data %>%
  select(
    ResponseId,
    "age" = Q5,
    "gender" = gender_cleaned,
    "mature" = Q7,
    "level_study" = Q8,
    "country" = Q9,
    "subject" = Q10,
    "english_first" = Q11,
    "neurotype_open" = Q13,
    "disability_open" = Q14,
    "why_open" = Q18,
    "how_open" = Q23,
    "advantages" = Q20,
    "disadvantages" = Q21,
    everything()
  )

# The final dataset is now stored in 'final_data'

Then, for each chunk, I asked it to refactor the code, where possible. I slightly amended the prompt to explicitly say that if the code was already efficient enough, not to change anything because some of the steps are quite simple.

Refactor this code where necessary. Explain what you changed and why. Do not change anything if the code is already efficient. Use the tidyverse where possible.

  • For step 3, Copilot and ChatGPT were both quite good at this and all.equal() returned TRUE.
# chatgpt and copilot solution
chat_dat_na_replaced <- dat_selected %>%
  mutate(
    across(c(disability_nos, physical_chronic, mental_health), ~replace_na(., "None"))
  )
# gemini solution - doesn't work, is a bit mad
dat_na_replaced <- dat_selected %>%
  mutate(
    disability_nos = replace_na(disability_nos, "None"),
    physical_chronic = replace_na(physical_chronic, "None"),
    mental_health = replace_na(mental_health, "None")
  ) %>%
  # Refactored using .after
  mutate(
    .after = {
      .x$disability_nos <- replace_na(.x$disability_nos, "None")
      .x$physical_chronic <- replace_na(.x$physical_chronic, "None")
      .x$mental_health <- replace_na(.x$mental_chronic, "None")
      .x
    }
  )

For step 4 it was a bit of a mess:

Both Copilot and Gemini removed everything() with this explanation:

Removed everything(): The everything() argument in select is redundant here because you’ve explicitly specified all the columns you want to keep.

Which is just wrong and means you end up not selecting the rest of the columns.

final_data_gemini <- gender_recoded_data %>%
  # Select & rename columns (exclude everything())
  select(ResponseId,
         age = Q5,
         gender = gender_cleaned,
         mature = Q7,
         level_study = Q8,
         country = Q9,
         subject = Q10,
         english_first = Q11,
         neurotype_open = Q13,
         disability_open = Q14,
         why_open = Q18,
         how_open = Q23,
         advantages = Q20,
         disadvantages = Q21)

ChatGPT’s actually worked and returned an identical object although I am not convinced that this code is better, I think it’s just more verbose:

Clarity: Using rename makes it clear which columns are being renamed before they are selected. This separates the concerns of renaming and selecting, which can improve readability. Maintainability: If additional columns need to be renamed or if the selection needs to be modified, having separate rename and select steps makes it easier to adjust the code.

final_data_chat <- gender_recoded_data %>%
  rename(
  age = Q5,
  gender = gender_cleaned,
  mature = Q7,
  level_study = Q8,
  country = Q9,
  subject = Q10,
  english_first = Q11,
  neurotype_open = Q13,
  disability_open = Q14,
  why_open = Q18,
  how_open = Q23,
  advantages = Q20,
  disadvantages = Q21
) %>%
select(ResponseId, age, gender, mature, level_study, country, subject, english_first, neurotype_open, disability_open, why_open, how_open, advantages, disadvantages, everything())

5.7 Alternative approaches

In the PsyTeachR courses we strongly use the tidyverse approach but it’s useful to be aware of other approaches, particularly Base R given that it doesn’t require any package dependencies.

You can simply amend the prompt to say use Base R as an alternative:

when refactoring, use Base R instead of tidyverse

  • It’s definitely not more efficient but ChatGPT’s output does indeed use Base R and did recode the variable correctly.
# Recode gender data
gender_recoded_data_chat <- na_replaced_data

# Define the recoding function
recode_gender <- function(gender) {
  if (gender %in% c("Female", "female", "Woman", "woman", "Cisgender woman", "female (she/her)", "F", "f", "Womxn", "Woman (tranas)")) {
    return("Woman")
  } else if (gender %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man.")) {
    return("Man")
  } else if (gender %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary")) {
    return("Non-binary")
  } else {
    return("Not stated")
  }
}

# Apply the recoding function to the 'Q6' column
gender_recoded_data_chat$gender_cleaned <- sapply(na_replaced_data$Q6, recode_gender)

all.equal(gender_recoded_data, gender_recoded_data_chat)
[1] "Component \"gender_cleaned\": names for current but not for target"

Gemini’s code kept having unmatched brackets and it took three attempts, but it did finally produce something that worked. Copilot also produced the same solution.

gender_recoded_data_gemini <- na_replaced_data
gender_recoded_data_gemini$gender_cleaned <- with(na_replaced_data, {
  ifelse(Q6 %in% c("Female", "female", "Woman", "woman", "Cisgender woman", "female (she/her)", "F", "f", "Womxn", "Woman (tranas)"), "Woman",
         ifelse(Q6 %in% c("Man", "man", "M", "m", "Male (he/him)", "Male", "male", "Trans man."), "Man",
                ifelse(Q6 %in% c("Agender", "Genderfluid", "GNC", "NB", "non-binary", "Non-binary", "Non-Binary", "Non-binary femme", "non-binary male", "non binary", "Non binary", "Nonbinary", "Queer", "Transmasculine", "Non-binary"), "Non-binary", "Not stated")))
})


all.equal(gender_recoded_data, gender_recoded_data_gemini)
[1] TRUE

5.8 Conclusions

I hadn’t actually used AI to perform these types of tasks before writing this book so here’s my takeaways:

  • CHECK EVERYTHING.
  • If you give an AI code, you simply cannot trust that it won’t change your code, even if that’s not the task you ask it to do. If you use AI to add or review comments, you must check the output. Tools like all.equal() can help perform these checks.
  • You also can’t trust that the comments will be accurate. Anything an AI writes must be checked before you use it. If you don’t know if it’s right, don’t use it.
  • Because you have to check what it does so carefully, don’t give it a big dump of code. Smaller chunks will end up taking less time.
  • In some cases it was really useful and as someone who doesn’t really use or know much Base R, I can see that this would be a great way to learn alternative approaches or to fill in comments.
  • That said, the amount of checking it takes is substantial and so I’m not completely convinced that it would be any quicker than doing it yourself.
  • They all struggled at different points although I think Of the three, Gemini was the clear loser of this chapter, which is a shame as I was starting to like it.
Caution

This book was written in Spring 2024 and should be considered a living document. The functionality and capability of AI is changing rapidly and the most recent advances may not reflect what is described in this book. Given the brave new world in which we now live, all constructive feedback and suggestions are welcome! If you have any feedback or suggestions, please provide it via Forms.