ChatGPT’s Robot Lawyers: Innocent Until Predicted Guilty

The year was 2019. It was a routine flight from San Salvador to New York, but for passenger Roberto Mata, violent turbulence struck without warning. A careening beverage cart rammed his knee and turned his cramped coach seat into a personal injury battleground.

Three years later, the wounds had not yet healed, at least in the eyes of the law. Mr. Mata slapped a lawsuit on Avianca Airlines for negligence. His bill of complaint painted a dire portrait of grievous and painful wounds that rendered him sick, sore, lame and disabled. All for a bag of pretzels and a mini can of Sprite. Sensing a routine settlement, the lawyers leaned on an eager new paralegal named “A.I.” to research precedents and draft memos. The filings contained sweeping legal claims without proper citations, and large passages seemed lifted verbatim without credit to the author.

When Roberto Mata sued Avianca Airlines in New York court in 2022, the case immediately took flight to federal court. Aviation suits like these typically qualify for “removal” or transfer to federal courts. As in this case, all International flights fall under the Montreal Convention, a 1999 treaty governing air travel liability worldwide. It imposes a two-year statute of limitations on injury claims, which Mata had already exceeded. Additionally, Avianca was based in Colombia, while Mata resided in New York, so their diverse citizenship granted federal courts authority under “diversity jurisdiction, “ and Mata’s lawsuit was able to smoothly touchndown in U.S. District Court for the Southern District of New York, on Judge Kevin Castel’s docket.

The Montreal Convention, Mata’s looming untimeliness, and Avianca’s blanket denials spelled turbulence ahead. Avianca conceded Mata was listed for the fateful Flight 670, but they asserted “affirmative defenses” – factors absolving liability despite alleged injuries. Mata claimed a beverage cart left him “grievously” wounded. But his bare-bones complaint lacked requisite specifics. Avianca could have even moved to dismiss based on the Convention’s two-year statute of limitations alone. Yet Mata’s attorneys still felt their shaky claim warranted a federal forum. Little did they know the jurisdictional move would soon escalate the case into a legal ethics conflagration. Their duty-free experiment with AI would ultimately crash land in sanctions.

The legal profession has been abuzz lately with provocative predictions about AI replacing lawyers. The boldest claim came from Joshua Browder, the brash young founder of DoNotPay. This online legal services startup made headlines when Browder tried to goad lawyers by offering $1 million to any counsel willing to argue before the Supreme Court, while wearing an earpiece fed lines by one of his chatbots. Unsurprisingly, no respected attorney took the bait.

Driving these disruptive ambitions was a new AI capability called natural language processing (NLP). Chatbots, like Browder’s, rely on an advanced form of NLP called a large language model or LLM.

The current star LLM is ChatGPT, and it was unveiled to great fanfare in November 2022 by AI research leader OpenAI. So how does ChatGPT conjure up such human-like conversation? It has been trained on a vast trove of digitized text from across the internet; including everything from Wikipedia to old novels. This allows ChatGPT to predict the most likely next words in a dialogue by recognizing patterns in how real humans speak and write, and while ChatGPT can certainly sound eerily conversational, experts caution against the hype because the bot lacks real-world knowledge and reasoning skills. ChatGPT can capably handle casual chitchat and creative entrepreneurs will likely find many new uses for this promising AI, but could this celebrated chatbot actually replace human lawyers? That dream collides with a cold hard fact – today’s AI makes stuff up and still lacks human discernment between truth and fiction.

In tech parlance, chatbots like ChatGPT are prone to “hallucinations.” They’ll confidently generate responses that seem plausible, but lack factual basis. Browder even bragged that he trained his bots to exaggerate problems to customer service reps, like a sneaky human might. In December, he even tweeted that his ChatGPT plugin lied to a cable company, claiming worse internet outages than were actually occurring. This tendency to fabricate explains why legal authorities bristled at DoNotPay operating without attorney supervision. Browder eventually removed offerings that could be seen as the unauthorized practice of law and DoNotPay is still billed as the “world’s first robot lawyer,” but it’s clear that chatbots still have a long way to go before they can be trusted to practice law ethically and accurately. It’s inevitable that someday AI models will ably parse legal codes to aid research, but for Roberto Mata’s lawyers, that day was not today.

In January 2023, Avianca moved to jettison Mata’s case, citing the 2-year statute of limitations under the Montreal Convention. Avianca further noted their active bankruptcy proceedings. Mata’s counsel, Peter LoDuca, swung back hard. He filed an opposition thick with legalese, arguing that New York’s 3-year limit should prevail. Alternatively, he claimed Avianca’s bankruptcy “tolled” the clock, pausing the countdown. It was a novel, though odd, contention. The Montreal Convention exists so global businesses like Avianca can standardize operations across borders. Subjecting international carriers to a patchwork of provincial laws would defeat this purpose. Nevertheless, LoDuca insisted Mata was entitled to file in New York State court, harnessing their more generous 3-year window. LoDuca argued that Avianca itself removed the case to federal court after it was properly filed in state jurisdiction. He wrote that courts have continually allowed plaintiffs to choose their preferred forum, upholding state remedies despite the Montreal Convention.

To support this reasoning, LoDuca invoked prior federal rulings, including Varghese v. China Southern Airlines in the 5th Circuit and Zicherman v. Korean Air Lines in the 11th. He peppered his opposition with these citations, wielding past precedents as weapons. Yet LoDuca’s artful arguments could not conceal the weak logic beneath. The Montreal Convention was designed exactly to prevent such jurisdictional jockeying and gamesmanship after international flights.

Now representing Avianca was the firm Condon & Forsyth, aviation law jocks who could recite the Montreal Convention in their sleep. So imagine their surprise reading LoDuca’s claim that the 5th Circuit said bankruptcy “tolls” the statute of limitations. As you likely guessed, neither the mythical Varghese decision nor the phantom Zicherman case actually existed. In fact, at least three other cases cited by LoDuca were pure fiction.

In their savage reply, Avianca’s lawyers pounced: “Although plaintiff ostensibly cites to a variety of cases in opposition to this motion, the undersigned has been unable to locate most of the cases cited in plaintiff’s affirmation in opposition. And the few cases which the undersigned has been able to locate, do not stand for the propositions for which they are cited.” In plain terms, Avianca’s reply meant that LoDuca was in serious trouble. His fictitious case citations were being called out. In short, the faulty citations left LoDuca’s position grounded.

So what exactly happened? Well, a couple of possibilities exist here. One is that ChatGPT ghostwrote portions of the brief and invented bogus case citations. The other possibility is that the lawyer himself fabricated them. Attorneys often draft placeholder arguments before research reveals the true lay of the legal land, but because the brief listed specific case names and citations, Occam’s Razor suggests a simpler explanation – the lawyer likely used ChatGPT and neglected to fact-check the AI’s outputs. He treated the bot’s words as gospel rather than critically scrutinizing its work, and this failure to verify would soon spark a crisis worthy of John Grisham. On April 11th, Judge Castel dropped the hammer with a show cause order. He gave LoDuca one week to produce the full text of the phantom rulings. Claiming he was “out of office on vacation,” LoDuca wheedled an extra week to conjure up the imaginary opinions. His stalling tactics would prove futile, however.

Now, New York courts have an unusual convention where judges can simply handwrite their rulings directly on motions submitted to them. This local practice contrasts with most jurisdictions where judges issue separate opinions after oral arguments have concluded. If LoDuca had been citing actual precedents, it would have taken only 20-30 seconds to print the full text from legal research databases like Westlaw or Lexis that attorneys use regularly. So needing an extra week to provide real opinions was a red flag that the cases most likely did not exist.

A “show cause” order was issued and demanded that LoDuca justify why he should not face sanctions or penalties. Judge Castel was essentially saying: provide your best argument immediately for why you should not be punished for citing fabricated cases.

That was the core purpose of LoDuca’s briefing – to explain away phantom cases and avoid sanctions, but his request for more time rang hollow given how fast valid precedents could be retrieved electronically. The judge likely saw through this stalling tactic intended to delay the inevitable revelation that the cases were fake.

Finally, on April 25th, LoDuca responded with a slippery affidavit to say the least. He claimed to attach eight legitimate rulings, but said the excerpts “may not be inclusive of the entire opinions.” Strange legalese for a lawyer. LoDuca also admitted he could not locate the phantom Zicherman case at all. His response reeked of desperation – the stalling and obfuscation of a huckster with no real wares to sell. It’s ludicrously easy to find federal decisions, especially when you have the case citation. Any law student could download them for free on Pacer or Google Scholar, or just take a field trip to the law library stacks, locate the bound reporter volume, and snap pics on your phone! Not finding a cited federal case is like claiming you can’t find rice in a Chinese restaurant. Its non-existence is the only plausible explanation. So what exactly are these mysterious case citations? Let’s geek out for a moment, because it will prove critical.

A common question among law students is “how exactly do you read a case citation, and what does it mean?” Well, it’s actually quite simple. A citation provides the case name, one or two numbers, and an abbreviation in the middle. This refers to the literal bound books, called reporters, holding past decisions. A law library’s stacks are lined with tomes, containing every court ruling like an encyclopedia. For federal cases, common reporters include F., F.2d, F.3d, F.Supp., and F.Supp.2d. The first number is the volume, the letters identify the reporter series, and the last number is the page. So a made-up citation might read:

ChatGPT v. Schwartz

183 F.3d 436

This fictional citation would mean the case appears in Volume 183 of the Federal Reporter, 3rd Series, starting on page 436. Make sense? Good, because we’re about to decode the fake Varghese citation LoDuca used. The “F” in the middle of the fake citation refers to the Federal Reporter; bound volumes containing federal appeallate court rulings. There are so many cases that we’re now on the 3rd edition, denoted by “F.3d”.

Though we rarely consult the actual books anymore, these citations allow easy electronic lookup. Just plug the numbers and letters into Westlaw, Lexis, or Google Scholar and you’re instantly taken right to the case text. So there’s really no excuse for LoDuca not inputting the citation and actually reading Varghese before citing it.

As far as the cases that were cited, even a cursory review would plainly reveal they were fakes. First, the bizarre pagination, shifting fonts, and handwritten mid-text characters are obvious red flags. Second, the phantom Varghese opinion from the 11th Circuit lists Judge Jordan and Judge Rosenbaum, both real jurists there. But it also cites Judge Patrick Higginbotham – who in fact sits on the separate 5th Circuit. Sloppy fabrication. Third, the Varghese plaintiff is first called Susan Varghese, personal representative of George Scaria Varghese’s estate, but a page later she becomes Anish Vargese, a clear contradiction. Fourth, the internal citations to other made-up cases like the missing Zicherman opinion underscore the fiction. Most damning is length. Legitimate decisions run 25+ single-spaced pages in tiny font, especially from the verbose 11th and 5th Circuits, but LoDuca’s fake cases clock in around 5 pages – the max a bot can churn out before becoming incoherent. In contrast, the couple real cases LoDuca cited are properly exhaustive. Only AI-generated text would be so concise yet pretentiously worded, and unlike actual court decisions, these counterfeit citations were likely written in crayon by a robot.

Avianca’s seasoned attorneys at Condon tactfully replied the next day. In measured legalese, they wrote: “Respectfully submitting that the authenticity of many cited cases is questionable.” This prompted a second incensed show cause order from Judge Castel. “The Court confronts an unprecedented circumstance,” he wrote. “Counsel’s opposition brief contains numerous citations to non-existent cases. Six submitted decisions appear to be fraudulent judicial opinions, replete with bogus quotations and citations.” He enumerated the fake internal citations fabricated within the sham Varghese excerpt LoDuca provided. The judge then confirmed, firsthand, with the 11th Circuit that the phantom Varghese opinion does not exist.

Judge Castel ordered LoDuca to appear on June 8th and explain why he should not face severe sanctions. His tone made clear that papering a federal court with invented law crossed a bright line. Three weeks later, LoDuca filed an affidavit admitting he did not work on the case law research, despite his name appearing on the filings. As local counsel, he merely let out-of-state attorney Steven Schwartz use his federal court privileges. Now, serving purely as local counsel, it’s understandable LoDuca wasn’t doing everything, but when another lawyer submits a filing under your name, you bear full responsibility for its contents. Lawyers must meticulously review others’ work before lending their signature, and blindly vouching for deficient submissions reflects poorly. As lead counsel, the buck stops with you, no excuses, and this too is a cautionary tale that LoDuca learned the hard way. Let his experience underscore the perils of lax oversight.

In reality, Schwartz, from LoDuca’s firm Levidow Levidow & Oberman, led the case from the start. When it moved to federal court, Schwartz continued drafting the case pleadings while LoDuca rubber-stamped the documents as filing counsel. LoDuca claimed that over 25 years working with Schwartz had left him no reason to doubt his work, but Avianca’s motion questioning the cases should have prompted personal verification before LoDuca vouched for them in a sworn affidavit.

Accompanying LoDuca’s affidavit was one from Schwartz himself. Therein, he confessed: “In consultation with the AI website ChatGPT, I located and cited non-existent cases in the contested filing.” Schwartz pled mercy for his “mistaken reliance on unreliable legal opinions” from ChatGPT, and claimed no prior awareness that ChatGPT’s content “could be false.”

This admission revealed an appalling ethical lapse. Not only had the lawyers violated their duty of competence by failing to validate ChatGPT’s representations, deploying citations they never read, they also tendered fictitious law to the court, which breached their duty of candor. Finally, offering convenient scapegoats in ChatGPT and each other still flouts the duty to supervise subordinate attorneys. Though ChatGPT indeed erred, the buck stops when firm partners file defective work, period.

When all was said and done, Schwartz claimed total responsibility for the fabricated cases, denying intent to deceive the court or Avianca. He avowed regret over using ChatGPT without verification, promising never to deploy AI aids again without absolute confirmation of authenticity first. Surprisingly, Schwartz did attach unconvincing screenshots of himself allegedly asking ChatGPT if the cases were real, and ChatGPT affirming they were. As if blindly trusting ChatGPT is diligent research, but by now, all of #TwitterLaw was aflame over the debacle, courtesy of Professor Mike Dunford.

Judge Castel was volcanic, and the next day he brought a third show cause order for Schwartz, LoDuca, and the Levidow firm to appear and explain themselves. Additionally, the judge added a new charge of potential sanctions for using a “false and fraudulent notarization” on the April 25th affidavit. The Judge was doubtful that any credible notary actually validated these defective filings. At best, this suggests additional procedural sloppiness. At worst, outright dishonesty by potentially forging a notarization. Typically, a notary public provides the required witness signature and seal confirming the signer’s identity, and many New York lawyers maintain notary certifications. Oddly though, while a single paralegal notarized all other case documents, the faulty April 25th affidavit accompanying the fabricated cases, was notarized by Schwartz himself. Even stranger, it was erroneously dated January 25th rather than April 25th.

What’s more, the phony case excerpts themselves showed major red flags. Formatting was incorrect, they were oddly concise, judges were wrong or fictional, parties contradicted themselves, and analysis made little sense. Realizing their mistake, the lawyers likely used ChatGPT to try and hastily manufacture matching case text, but this still failed to explain or excuse their lax verification practices. Attorneys must review entire decisions, not excerpts, to confirm citations are properly applied.Moreover, someone had to provide ChatGPT the fictional case names to generate text. This implies whoever drafted the initial brief failed to read or verify any cited authority. They deployed non-existent law, plain and simple, revealing how blind reliance on AI can lead attorneys dangerously astray.

Sensing the brewing justice storm, outside counsel appeared in the case to attempt damage control. Separate lawyers appeared for the Levidow Firm and Schwartz as one group, and LoDuca appeared pro se- without counsel. They gingerly floated postponing the June 8th hearing if the court was so inclined, but Judge Castel denied their stalling sharply. He granted Schwartz and Levidow only 2 extra days for a written reply, pointedly noting the hearing remained firmly scheduled for June 8th, and his terse tone carried a clear message – no more games or evasions.

The judge would have answers directly from the lawyers who disrespected his court with invention. Their attempts to delay reckoning had failed, and his reply to LoDuca was even more ominous stating,”Mr. LoDuca is differently situated from Mr. Schwartz and the firm. He has availed himself of a full and fair opportunity to respond to the court’s OSC regarding non-existent case law and three possible grounds for sanctions. He is not entitled to a do-over.”

In their last filings, the lawyers stuck to their story that Schwartz was too ignorant to intentionally deceive, LoDuca just negligently failed to notice anything amiss, but their excuses rang hollow after months of evasions. At the hearing, Judge Castel took the extraordinary step of making both attorneys testify under oath on the stand. This allowed him to size up their credibility firsthand and pose piercing questions. The lawyers likely expected a routine motion hearing, but instead faced a searing courtroom examination. Due to their repeated chicanery, the reckoning they evaded for so long had arrived. No more hiding behind carefully worded affidavits.

Judge Castel, questioning LoDuca under oath, began by asking, “What was your understanding of your obligation in connection with your March 1st submission under Rule 11?”

“To be factual and truthful, I relied on my colleague Steven Schwartz with me at Levidow, Levidow and Oberman,” LoDuca swore.

“Did you do anything other than sign your affirmation? Did you read any of the cases?”

“No.”

“Did you do anything to make sure that these cases existed?”

“No.”

Later on, the judge asked, “Do you recall writing to me you were going on vacation

and the court giving you until April 25th?”

“Yes.”

“Was it true that you were going on vacation?”

“No, judge.”

The judge then asked, “You did not see this was a bogus case?”

“No.”

“You see that it was in different fonts?”

“A little bit larger.”

“Who typed this?”

“I believe it was Mr. Schwartz.”

Schwartz then takes the stand and Judge Castel asks, “How many cases have you done? A 1000?”

“Yes.”

“How do you conduct legal research?”

“I research cases.”

“Do you read them?”

“Yes.”

The judge then asked, “Did you prepare the March 1st memo?”

“Yes, I used Fastcase, but it did not have federal cases that I needed to find. I tried Google. I had heard of ChadGPT.”

“Alright, what did it produce for you?”

“I asked it questions.”

“About the Montreal Convention or the position you wanted to take?”

“Yes, for our position.”

“You were not asking for an objective view, but cases to support your position?”

“I asked it for its analysis.”

“Did you ask ChatGPT what the law was or only for a case to support you? It wrote a case for you. Do you cite cases without reading them?”

“No.”

“What caused your departure here?”

“I thought ChatGPT was a search engine.”

Then the judge leaned in and asked, “Did you look for the Varghese case?”

“Yes, I couldn’t find it.”

“And yet you cited it in your filing.”

“I had no idea ChatGPT made up cases. I was operating under a misperception.”

“Mr. Schwartz, I think you were selling yourself short. You say you verify cases.”

“I thought they were cases that could not be found on Google.”

“Six cases, none found on Google. This non-existent case of Varghese, the excerpt you had was inconsistent even on the first page. Can we agree that’s legal gibberish?”

“I see that now. I thought it was excerpts.”

Judge Castel continued, “Avianca put your cases in quotations. You know what F.3D means, right?”

Schwartz replies, “Federal district, third department?”

The judge then asks, “Have you heard of the federal reporter?”

“Yes.”

The judge asks, “that’s a book, right?”

Schwartz answers, “Correct.”

The judge then goes in for the kill one more time and asks, “So you were the one going on vacation returning on April 18th?”

“Yes.”

“When you saw the court’s order it wanted to see the cases, did it cross your mind that the court checked for the cases?”

“I wanted to comply.”

The judge continues, “you told me that ChatGPT supplemented your research, but what was it supplementing?”

“I’d used Fastcase at the beginning.”

“ChatGPT wasn’t supplementing your research. It was your research, right?”

“Yes.”

The Judge then ended his questioning opining, “‘ll be taking this under advisement and entering a written decision. It’s been called a mistake, but there’s more. The record will reflect whether that put Mr. Schwartz and Mr. LoDuca on actual notice that their cases were non-existent. Mr. LoDuca was asked for the cases. We know what he did, and Mr. Schwartz did.”

In the end, Judges can sniff out falsehoods in a heartbeat. Once branded a liar, an attorney will lose all influence, and this case was no different. Ultimately, the Judge handed down a joint fine of $5,000 to the lawyers and the firm, but the crash landing of AI lawyering in Judge Castel’s courtroom will not soon be forgotten. Although this folly may have diverted the trajectory of AI in law, it has not grounded the mission entirely. This was merely one flimsy prototype grounded quickly on the runway. More thoughtful integration of human and machine may still reinvent legal work for a new era. Yet, the integrated cockpit of man and the legal machine still has a long flight ahead.

Artificial Intelligence
in Action

Top Stories

How Meta’s CyberSecEval 3 can help combat weaponized LLMs

Forrester’s CISO budget priorities include API, supply chain security

Table-augmented generation shows promise for complex dataset querying, outperforms text-to-SQL

ChatGPT’s Robot Lawyers: Innocent Until Predicted Guilty

Leave a Reply Cancel reply

Related Strories

Simple Guide to Training Your Team to Use ChatGPT Effectively

How to Use ChatGPT to Review and Shortlist Resumes Efficiently

Mastering ChatGPT Prompt Patterns: Templates for Every Use

How to Write Smarter ChatGPT Prompts: Techniques & Examples

Quick links

Popular Categories

Follow Socials

Artificial Intelligence in Action

Top Stories

How Meta’s CyberSecEval 3 can help combat weaponized LLMs

Forrester’s CISO budget priorities include API, supply chain security

Table-augmented generation shows promise for complex dataset querying, outperforms text-to-SQL

ChatGPT’s Robot Lawyers: Innocent Until Predicted Guilty

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Simple Guide to Training Your Team to Use ChatGPT Effectively

How to Use ChatGPT to Review and Shortlist Resumes Efficiently

Mastering ChatGPT Prompt Patterns: Templates for Every Use

How to Write Smarter ChatGPT Prompts: Techniques & Examples

Get Insider Tips and Tricks in Our Newsletter!

Artificial Intelligence
in Action