Grading ChatGPT: The Computer Strikes Back

Part 3 in a semester-long series

Jan 16, 2023

Welcome to the third entry in the experiment in which I ask ChatGPT the same questions I present to my students in ‘Music & Meaning In Our Lives’, a liberal arts course I teach at the University of Michigan. The first post introduced the series and centered around the question, ‘what is music?’. And, the second post challenged ChatGPT to define The Parameters Of Music, which is the lightly-technical, intentionally accessible analytical framework I use for analyzing musical examples with my liberal arts students.

This week’s entry changed wildly as I worked on it. Initially, I thought I would not be able to interact with ChatGPT at all because I experienced a few different errors when I tried to log in. With the program on the fritz, I came up with an alternative plan to reflect on the unique nature of the course I teach and the advantages its content and design give me against the software’s capacity to help students cheat.

But, then, the errors stopped.

Once I was back in, I was surprised to notice few things were different: I could not access my past exchanges, and, when I re-entered prompts I had used previously, the the responses were not the same.

That’s when I realized what must have happened: OpenAI updated ChatGPT.

A stronger, more powerful ChatGPT?

To keep pace with my class, my plan for this week involved asking the program to compare specific musical examples and answer questions about Robin James’ fabulous book The Sonic Episteme, part of a chapter from which serves as the first reading I assign my students. The comparative ‘listening’ (remember, ChatGPT cannot listen to music) questions are based on the image below, which I use to visualize one goal of the music analysis I do with my students: making evidence-based material connections across traditional stylistic/genre boundaries.

You can view and interact with this graphic by accessing my full, semester-opening Google Slides presentation via this link.

I always share this image in the first class session of ‘Music & Meaning In Our Lives’, because it illustrates so much about what we do for the rest of the term. For those of you who have followed along with this series, you should recognize that the terms bridging these stylistically disparate examples are The Parameters of Music. Everything is oriented around the song “Nickerson’s Theme” by Ahab, a German funeral doom metal band (one of my favorites!) and the lines illustrate shared characteristics I have identified between this song and others. For example, one distinctive feature of “Nickerson’s Theme” is singer/guitarist Daniel Droste’s vocal performance, which begins in a more conventional ‘clean’ sound, and later features his excellent ‘death growl’ technique. Through this use of multiple vocal timbres, I am cable to connect “Nickerson’s Theme” to Mariah Carey’s “Emotions” and Ja Rule/Ashanti’s “Mesmerize”, which proffer characteristically similar vocal performances.

On January 8, I asked ChatGPT to discuss the timbre in “Nickerson’s Theme” and it demurred:

This week, however, the program’s new version delivered a strong response when I asked it to do the more advanced task of comparing the content of two different songs:

The software’s writing here contains several important errors, but it is more impressive than any of the ‘good’ work I have previously received. There are issues with the overall argument as well as certain specific details: for instance, The Boats of the Glen Carrig is not the Ahab album on which “Nickerson’s Theme” appears. Nonetheless, as the above graphic shows, I fully agree that rhythm is a valid point of connection between these two examples (it’s almost as if ChatGPT was in class last week with my human students!).

Despite these improvements, the software’s overall argument reaches too far. Angelique Kidjo’s album Oyaya! (2004) may represent the diasporic relationship between West African and Afro-Cuban music, but this is not the case with Ahab’s music. I connect “Nickerson’s Theme” and “Conga Habanera” because they both feature a certain rhythmic pattern we can trace from West Africa to the Caribbean and, ultimately, to the musical language of almost all contemporary popular music. But, this does not mean Ahab’s music makes an equivalent statement regarding the African diaspora. I also believe it is inappropriate to say “Nickerson’s Theme” represents, “resistance against oppression”, even though heavy metal is an iconoclastic, subversive genre. That Ahab repurposes distintive musical materials we can trace to some African musical practices, obscuring their origins and separating them from their heritage is an unwitting example of white supremacy’s cultural domination, not a radical assault on unjust hierarchies of power.

The final paragraph here is also vague, ChatGPT provides no details as to how Ahab’s music expresses its purported connection to African and Afro-Caribbean traditions. So, I follow-up and ask the program to be more specific on this matter, which prompts ChatGPT to blatantly contradict itself:

The other topic for this week is Robin James’ wonderful book, The Sonic Episteme. I assign my students an excerpt that discusses neoliberalism, the ‘chill aesthetic’, and contrasting songs by Harry Styles and Ed Sheeran, among other things. For their homework, I ask my students to simply write two short takeaways based on the reading, as more detailed engagement with Robin’s work will come in class. So, I prompt ChatGPT to do the same (though, about the whole book):

These are clearly vague, but they are not inaccurate to the text. I will be honest and say that I would find these takeaways totally acceptable if a student submitted them. But, my expectations with this kind of assignment are open, as these short submissions are only one of the at least four ways we engage with the given excerpt of The Sonic Episteme each semester (the others are: lecture, full-group discussion, and annotated small-group discussion; this also doesn’t count other comparative work we might do with materials introduced later in the term).

The new version of ChatGPT is certainly more powerful and, in a way, daring. Like in the discussion of “Nickerson’s Theme”, here we see a prompt that requires familiarity with less-widely-known materials generate a legitimate attempt instead of a timid refusal. More impressive is the following comparison of the songs “Sign Of The Times”, by Harry Styles”, and Ed Sheeran’s “Shape Of You”, which are two of the examples James refers to in the selection from The Sonic Episteme I assign my students:

Not only is this response totally fine, it once again shows that ChatGPT is, all of a sudden, comfortable writing analytically about musical examples. Yet, if I received this paragraph in my class, I would ask for a more specific discussion musical characteristics (I wants students to be more detailed than, “emotive piano”, and I will invite them to describe the content in the piano part that communicates this so-called ‘emotive’ sensibility). It is hard to articulate, but there is a glib, pat quality to this, which, frankly, is not uncommon in human sutdents’ work either. I can’t ignore the subtle feeling that this submission’s author has only read about these songs, not actually listened to them (which is kind of true, right?).

Despite ChatGPT’s improvement, it still has major flaws with respect to overall specificity and the accuracy of the details it asserts. When I asked the program to explain how The Sonic Episteme defines, “chill aesthetic” (one of the important concepts in the excerpt I assign my students), I receive an ostensibly convincing response that, in reality, has very little to do with what Robin writes in her book:

Final Grade: C

This week’s grade is mostly a sign of respect for ChatGPT’s growth beyond the limitations of its previous version. Like with my human students, I am inclined to reward the software (and its engineers!) for pushing itself beyond the boundaries of its abilities and attempting work that it previously thought impossible. But, as always, there are problems with the program’s responses, although I will acknowledge this week’s submissions are much more acceptable than what I have seen earlier in this series.

For many of you, it probably seems that this update means ChatGPT has become a more effective tool for students who want to cut corners, but I do not necessarily agree. It now appears the software is willing to take on prompts for any of my course’s short takeaway assignments, and it handles these well enough. But, we must remember this work represents only one of many points of engagement students will have with these assigned materials. And, even though ChatGPT’s longer written submistion look better now, their tendency for generalization leads to weak arguments and frequent factual errors.

This is where ChatGPT may present a trap to students who aim to use it to do more than what it does most capably. Consider the response regarding, “chill aesthetic”, in Robin James’ The Sonic Episteme: it reads well, it possesses the program’s signature confidence, and there is no odd language that might signal the text’s origins. But, the software’s work is blatantly incorrect. It is not hard to imagine a situation in which ChatGPT’s slickness fools a student user into submitting an erroneous assignment. If students understand the subject matter in question, they should be able to successfully use the program as a support system for their written work, but, in this case, they aren’t cheating at all: they just are leaning in a new way.

About Music

Grading ChatGPT: The Computer Strikes Back

Part 3 in a semester-long series

A stronger, more powerful ChatGPT?

Final Grade: C