Sunday, May 28, 2023

Professional Use of AI

 Not so long ago, I wrote a piece on some of the ethical question marks that Large Language Model (LLM) Artificial Intelligence (AI) raises.  In the last week not one, but two, topics were brought to my attention that I consider to be examples of the dangers of naive application of AI.  

First up is an experienced lawyer who used ChatGPT to supplement his legal research for a case. I’m not going to slam the lawyer too hard for this - this is clearly a case of a “naive user” making some very dangerous assumptions about how the technology operates. He basically assumed that ChatGPT wouldn’t lie to him - largely because he seems to have thought that it was like a natural language front end to a search engine. It isn’t, and never was. 

What ChatGPT did was literally make things up. It fabricated entire cases out of thin air and then assured the user that it all came from credible (but unnamed) sources.

Now, if this work had been done by an articling student, or by a legal assistant, the lawyer would have an ethical duty to verify the work adequately before submitting it to the courts. One would assume that an articling student who simply fabricated an entire story from whole cloth would find themselves dismissed immediately.  

What do we do with a lawyer who used an AI in a similar manner?  In the most basic of ethical analysis, the lawyer had a duty to verify the work of the AI fully and thoroughly before incorporating it into the submission to the court. But at the same time, the lawyer is also unlikely to do more than he did with a student - take a handful of references and verify them.  If you have a lengthy list of references, you’re probably going to do a sampling and move on.  If your student fabricated an entire case out of thin air, you might miss it, but then again, so might the courts. 

I’ll come back to this shortly - I want to introduce the second item. A colleague brought to my attention the existence of a Psychiatric testing/diagnosis service called Clinicom.  Clinicom is not merely a Clinical Records Management system.  It purports to be an AI powered assessment and tracking system. 


My first “yikes” around this is the idea of an AI being used for diagnosis - especially in mental health, where diagnosis is complex on a good day. The risk in here of clinicians coming to rely on the AI’s opinion as a second to consultation is huge and very worrisome. If the AI produces “reasonable” diagnoses most of the time, it’s very easy (and human) to fall into the pattern of relying on its opinion over both your own assessment as well as that of your peers.  

Clinicom does have a single paper published demonstrating that the tool itself has been subjected to some degree of validation. To be clear, I do not believe that this is anywhere near meeting the level of scrutiny that needs to be applied to the development of a novel approach to mental health assessment.  In fact the paper itself does not address a number of significant aspects that need to be considered. 

( *for clarity, I will not be doing a detailed analysis of this paper here - that would be a substantial task in its own right, worthy of its own post )

In both of these cases, existing professional practice ethics place the burden on the practitioner to appropriately validate the use of any such technology. That is to say, that whether one is making a diagnostic assessment of a patient, or formulating a brief to put before the courts, it’s ultimately the practitioner’s responsibility to ensure that the resulting document is accurate and objective. 

You might look at that and say “well, we’re all good here, move along”.  But we aren’t “all good”. Not even close to “good” at this moment. 

Consider the lawyer in the first case for a moment. Here we have an experienced lawyer with decades in the field, and yet using the technology inappropriately led him to create a submission that is riddled not only with errors, but outright fiction. We can legitimately argue that his error is purely on him, and from certain perspectives, that’s completely true. Can we ignore the fact that the ChatGPT AI not only fabricated rulings and quotations from thin air, but then assured the user that they came from “reputable databases”?  

This is one of the first points where we have to talk about the difference between human intelligence and an AI. Humans have emotions. There are all sorts of cues when someone is lying, reactions to telling lies when we are growing up help form an understanding of when it is appropriate to invent facts (e.g. we’re writing songs, poetry or fiction), and when it’s very, very wrong (in a court of law, for example). At this stage of development, AI most certainly lacks that characteristic in any meaningful sense. 

In other words, with ChatGPT and its relatives, we have created an AI that is quite capable of generating its own fictions, but it lacks entirely any kind of moral and ethical framework from which to understand whether or not that is appropriate. 

Humans are very good at working with approximate information. We aren’t so good at dealing with large amounts of highly detailed information.  Think about the situation when a group of you go out for a meal at a restaurant, and you split the bill.  You look at the bill, divide it by the number of people and it comes out to $20 a person (roughly), so you throw in $25 to cover your portion and the tip. Do the same exercise with a calculator, and it comes out to $18 + $3.60 for the tip. We approximate - it’s easy to overwhelm us with details, and an AI can easily produce a deluge of both information and misinformation that it would be next to impossible for any one human to sort through it all and pick out the useful bits. 

In a field such as healthcare, the issues become even more difficult for a number of reasons.  First, matters of patient confidentiality suddenly come to the foreground. Legally speaking, you can sign all of the waivers, agreements, and so on you like, but that doesn’t do much to change the fact that your confidentiality must be guaranteed by the practitioner. 

“Oh, but just strip the names and other identifying information off the data fed to the AI, right?  Not so fast. Sure, you can remove the name, address and phone number fairly easily, but what about date of birth - age plays a significant factor in a lot of health care contexts. Similarly, because CliniCom is using some kind of “adaptive” approach to assessment, the combination of questions that a client answers may well be a form of identifying information in its own right. 

The other aspect of ClinicCom that I am deeply concerned about is cultural bias in the questions as well as in the AI dataset as well as in the questions it uses for assessment purposes. Testing construct validity in instruments across cultural lines is a large, and very complex task. Meaningful questions in middle income America may mean nothing at all to someone whose cultural background is rural China. Closer to home, even between Canada and the US, there can be significant differences, and then we get into discussions of how aboriginal peoples may view those same matters. 

When we come to examine the datasets used to train the AI, those same issues of bias and cultural awareness come to the surface. Guess what? That can significantly impact how the AI interprets the responses from a patient. 

As an example, many North American aboriginal peoples have a tradition around what are called Vision Quests. A properly trained professional will understand what those are and the degree of reality that the patient may ascribe to the experiences involved, but an AI not trained properly to understand that (or worse, failing to understand it at all), may in fact arrive at the conclusion that the patient is hallucinating and possibly in the throes of psychosis. 

The consequences from a diagnostic and treatment perspective are enormous. An improperly trained AI may well draw conclusions that … well … are horribly incorrect. 

To be clear, I am not privy to the inner workings of CliniCom’s platform here, but these are considerations that came to mind as I reviewed their website and some of the topics that it failed to address. I am not saying that these problems exist, but rather that the possibility of them existing is very real, and that from a societal perspective, they should be viewed as topics worthy of further exploration and consideration as we develop this technology. Practitioners in the field should be doubly cautious, and use CliniCom in conjunction with other tools to verify its appropriateness. 

Likewise, in other domains such as law, academic works, etc., we need to be clear that a lot more work is needed before AI can be used reliably to assist in those domains. Simply assuming that the AI is benign is not adequate. Even if the AI is not acting out of malice, the ability of AI like ChatGPT to simply make things up is enormously problematic.

Creators of LLM AI like ChatGPT now have an additional set of tasks to consider.  Yes, ChatGPT is capable of generating new content that looks a lot like existing content.  (E.g. One can ask ChatGPT to write “The Night Before Christmas” in the style of Terry Pratchett, and get a disturbingly real seeming result). That is no small accomplishment. Now we need to teach these AI constructs the idea of when it is appropriate to “invent” and when doing so is an absolute no-go.

Users of AI also need to become more proficient in validating the results that an AI does produce.  If we thought misinformation was already a problem on the Internet, when it has been mostly people producing content, can you imagine how quickly even the most useful of sites could be overrun with real-looking, but absolute nonsense? 

None of this is to say that AI should not be developed. Not at all. But, it is vital that both creators and users of AI take steps to ensure that the technology is in fact used appropriately. If you are a lawyer, doctor, engineer, or other professional, it is incumbent upon you to ensure that you appropriately validate anything that you use an AI for in the course of your work. 

No comments:

The Cass Review and the WPATH SOC

The Cass Review draws some astonishing conclusions about the WPATH Standards of Care (SOC) . More or less, the basic upshot of the Cass Rev...