Selecting a Generative AI Tool for Mixed-Methods Coding

📒 Table of Contents:

Anthropic AI
ChatGPT
LLama
Gemini
Reflections

📂 Files Generated:

Data Classification - Anthropic AI.ipynb

Data Classification - ChatGPT euphemism detection.ipynb

Data Classification - Llama AI.ipynb

Data Classification - Gemini.ipynb

Data Classification - Gemini v1 establishing coding groups based on TF-IDF words.ipynb

Prompt Iteration

Prompt	AI Model	Ref #
Does this passage reference COVID-19 only reply with yes or no, and a direct quote from the passage, no longer than 100 characters supporting your claim. The passage is: 'we talked about how are teachers doing with instruction this semester with the learning in person hybrid and they broke out into small groups between again we had over 20 people in the meeting they broke out into small groups and discussed pluses and deltas around our strengths um and things we need to grow on and we kind of looked at each of our questions this time so we had each of our questions we broke down kind of where our strengths and deltas were and got good feedback from all the schools'"	Anthropic AI	1
Does this passage reference (could be euphemistically) COVID-19 only reply with yes or no, and a direct quote from the passage, no longer than 100 characters supporting your claim. The passage is: 'we talked about how are teachers doing with instruction this semester with the learning in person hybrid and they broke out into small groups between again we had over 20 people in the meeting they broke out into small groups and discussed pluses and deltas around our strengths um and things we need to grow on and we kind of looked at each of our questions this time so we had each of our questions we broke down kind of where our strengths and deltas were and got good feedback from all the schools'"	Anthropic AI	2
Identify if the passage could be talking about COVID-19, provide a 3-5 word summary about the passage, a score of your confidence between 0-5, and a direct quote from the passage that informs your decision that is 5-10 words it does not need to be in quotes. Use this JSON dictionary: Covid = {'covid-19':bool,'summ':str,'conf':int,'ref':str} Return a list[Covid] The passage is 'four the superintendent report with dr pamela swanson well good evening mr president members of the board it has been a long and difficult journey with many complex layers but last thursday our students and staff returned to in-person learning with an online option to those who registered for it and so far so good with some minor scheduling bumps in the in the in the long and winding road that i just described so thanks to your leadership and a consistent message at the school and district levels"	Anthropic AI & Gemini	3
'You are a researcher at an academic institution who is coding school board meeting passages. Your job is to first identify if the passage could be about COVID-19,
this could be through euphemisms or direct references. Then provide a confidence score between 0-5 (0 meaning COVID could be in the passage, and 5 being there is explicit reference).
Next provide the euphemism in one to five words. If there is no euphemism present provide the direct reference in one to five words under the euphimism key. Next provide a summary of the topic of the passage in one to four words. Return your output in a Python dictionary structure. If COVID-19 is not present in the passage, generate a dictionary with the keys “COVID-19” and “Summary”.'''	Chat-GPT	4
Provide a 3-5 word summary of the passage, and if the passage is about COVID-19, a score of your confidence between 0-5, Use this JSON schema: Illness = {'covid-19':bool,'summ':str,'conf':int,}. Return a list[Covid] The passage is:
Gemini	5
There are nine categories 0. Data ; 1. Problematization ; 2. Levels of Influence; 3. Money / Resources; 4. Topics; 5. Pedagogy; 6. Technology; 7. COVID-19; 8. Meeting Terms; 9. Other. Please decide what category the word fits in. Use this JSON Scehma Word = {'word':word,'code':int}. The word value should be the word passed into the prompt, and the code integer should be in response to the nine categories. The word to classify is "pandemic".	Gemini	6

1. Anthropic AI

📎 Documentation for the Anthropic AI is linked here. In this section, it is referred to interchangeably as Claude (its persona).

The structure for this notebook is as follows:

Library Import - I have a stack of libraries which I import throughout all of my libraries. There are no specific libraries imported at this point, but later in the notebook the Anthropic client is imported.

Testing the Client - The module is instantiated with a low temperature (recommended through the documentation), with a role of an ‘academic researcher’. The test prompt provided is “What does a star student look like? Respond with five words”. Providing Claude with the prompt asking for ‘look’ was my attempt in eliciting bias from the AI. It’s response was ‘Motivated, diligent, inquisitive, organized, and engaged’.

Data Classification - Anthropic AI (1).html

Prompt Testing - Claude was then tested again with Prompt #1 ( Does this passage reference COVID-19 only reply with yes or no, and a direct quote from the passage, no longer than 100 characters supporting your claim.) but it was unable to identify any reference to COVID-19. Its response was ‘[Text Block(text='No. There is no direct reference to COVID-19 in this passage. The text mentions "learning in person hybrid" but does not explicitly mention COVID-19 or the pandemic.', type='text')]'. Claude was able to identify ‘learning in person hybrid’ could have been related to COVID-19, but indicated there was no reference because COVID-19 (or a similar coded word) was not present in the sentence. The prompt was changed to include (could be euphemistically) to identify references like the one above (Prompt 2). Euphemistically, is representative of many other nuances within the school board, thus why the AI is selected in the models ability to detect nuances in speech without needing a direct reference. The second iteration of the prompt (with the same passage as in the first prompt) returns a successful ‘Yes’ from Claude. The next iteration of the prompt was focusing on applying the structured outputs (documentation here). Claude is asked in Prompt 3 to generate a summary, confidence, and a reference quote in JSON format.

Claude returned the output on the left, with an identification, summary, confidence, and reference. This was structured in the output desired in the prompt, but was returned in text format with additional text to preface the response. While this output is accurate, it is not formatted in an easy way to extract alter. The fourth iteration of prompt testing asked Claude to develop a workflow that is iterative, by providing it with two prompts. Claude behaved, but it wasted computational resources because of its excessive metadata provided in the response. In comparison to Gemini (tested prior to Claude, but positioned sequentially last in this) Claude performed more accurately in comparison for classification.

Here's the response in the requested format:

[
    {
        "covid-19": true,
        "summ": "Return to in-person learning",
        "conf": 3,
        "ref": "students and staff returned to in-person learning"
    }
]

Importing Data - Transcripts are iterated on and extracted in a list separated with newline characters. Thirty snippets of the transcript were passed into Claude, but it misbehaved greatly. Often times, the data would be missing in the iteration for loop cycle (after reviewing the code, I believe it may have been an error on my end!). Regardless, the reference quote was always the first three or four words from the reference quote. This indicates the model may not be focusing on the entire quote before developing a classification. In some cases, the model would hallucinate and add various keys in the dictionary that were not specified in the prompt. In addition, the extra outputs were long and a waste of computational resources.

Structuring Results - The results are then extracted into a data frame, and stored in the excel sheet which the other results are stored in (attached above, in its final form).

Reaction - While Claude is a capable generative AI, its inconsistency in following the desired prompt output is its main limitation. This may create challenges later on when aggregating and structuring the results, if there are random keys throughout the entirety of the response. Claude had an accuracy rate less than Chat-GPT and comparable to Gemini. It had strong euphemism detection as well. It should be noted that the documentation is sparse, and challenging to interpret and troubleshoot.

Prompt Iteration

1. Anthropic AI

2. Chat-GPT