Literature Review on AI in Law

21 min readJan 27, 2024

This blog was inspired by Owl from the Laion Discord server. Thanks for the discussions! In this blog, my main goal is to go through why law is a very hard problem which explains why law is the way it is today, what data is publicly available on law, and what research is currently done there!

If you want to watch a video instead, I did a presentation to the huggingface community here if you are interested!

Why Law is hard

Here, we will first examine why replacing judges is difficult by going over 3 fundamental reasons why law is hard

Logic

The first idea that I had for law was to use logic. Then, since the law and evidence, we can say he is guilty or not using a program! This will be great for avoiding “biases” in judgments and can be completely impartial.

But then why do we still have judges? Why don’t we just have the lawyers and prosecution argue and then have a computer logically conclude who is guilty?

The reason is simple. Laws operate on a certain kind of logic that computers are bad at called nonmonotonic logic. Now what is nonmonotonic logic?

Nonmonotonic Logic

Nonmonotonic logic is logic where both sides can be correct. In that, the rules are formulated in such a way that previous conclusions or even individual laws can contradict each other in certain situations. So, even if you went through all the effort to prove a person is guilty or nonguilty from the laws and evidence, there can be a twist that makes it all invalid! The example I got was the Tweedy bird problem

Where if we have an axiom, a bird can fly, Tweedy is a bird, so Tweedy can fly, while is good on paper, if Tweedy is a penguin, he can’t fly. So there is a contradiction!

A more real-world example I found of this was Mapp V. Ohio

Mapp V. Ohio

This is the court case that made illegal searched material by police inadmissible in court. This is called the exclusion rule. For the explanation of the judge's ruling, I read from here thanks to the nonprofit free law!

In the US Supreme Court case of Mapp vs Ohio, a woman was found to have obscene material after an illegitimate search of her property. The interesting thing about this case was while the 4th amendments of the Constitution say that police can’t do “unreasonable searches”, in a previous court case called Wolf vs Colorado, the Supreme Court said evidence without a warrant is admissible in court. So, what this court case in Mapp vs Ohio did was overturn the Wolf vs Colorado decision given a retrial.

While historically this is an interesting, mechanism-wise a few things that I found interesting was

Given the same case, judges can reach different conclusions given the same rules and evidence. This can be found in the dissent opinions of some judges in court cases(like the Supreme Court).
While judges, at least in the US, reference previous cases they are not held to a gold standard and they can be overruled.
The reasoning behind overruling seems to be a logical contradiction but it doesn’t have to be a contradiction at the time of the previous sentence. For example, Wolf v Colorado, cited that “almost two-thirds of the States were opposed to the use of the exclusionary rule” as one of the reasons, and in Mapp v Ohio, they cited that since then “more than half of those since passing upon it, by their own legislative or judicial decision”. This is a classic case of new evidence influencing the foundation of the previous case making the previous case false!

Here, I’d like to point out the first fundamental issue with the law a Huggingface community member singh pointed out(Thanks!)

The first fundamental reason why law is hard

Much like in this case, devoiding law of interpretation and converting it into code is an open problem that perhaps can’t be solved because law was arguably made that way. For example, in the above, the interpretation of the 4th amendment was brought into question if it applied to the State or not! There is an open-source effort led by

Of codifying law based on every ruling to make a “current” interpretation of the law but still, because the law is non-monotonic. This needs to constantly be updated.

However, even once we have a current interpretation of the law, all the justifications and arguments for each “interpretation” need to be maintained for it to be correct so I’m curious how far this can go!

Now, we did discuss why logic will fail in this situation. Is there a logical framework where it is practical to act as a judge? The paper that first tackled this or a slightly simplified version “On the acceptability of arguments and its fundamental role in non-monotonic reasoning, logic programming and n-person games”

This paper was also published/cleaned up here which I will reference.

For the following 2 papers, I don’t think I would have understood them properly without this presentation from the 2nd paper’s author and this very good youtube video on a separate paper. He covered the fundamental very well.

On the acceptability of arguments and its fundamental role in non-monotonic reasoning, logic programming and n-person games

The implementation is here. The purpose of this paper is to examine how humans resolve arguments and how to make a framework around this.

The first principle the paper mentions is “The one who has the last word laughs best”.

For example, the paper gives an example of government I and A

I: “I cannot negotiate with A because they don’t even recognize I”

A: “I don’t recognize the A either”

Here, in the initial argument, I places the blame on the A for blocking the negotiation.

This places the blame fully on the A unless it’s justified. In the A’s counter-argument as I didn’t recognize A either, from the same reasoning, I’s attack against A is nullified. However, neither side won. If I were to say

I: “But A is a terrorist organization”

This justifies the failure of I to recognize A. Which I find pretty interesting. At least so far, it feels more superficial than say logic and math in that it only tackles the surface-level arguments and none of the foundational issues.

However, I think the goal of this paper is to evaluate arguments given the arguments given and not account for new arguments per se.

Does this mean that we can’t establish a deep belief from arguments?

While a bit philosophical, the paper argues that a statement is believable if all attacking arguments against it can be nullified. This means that in this logic framework, axioms are more like pillars that are constantly fighting off attacks. If a pillar fails against an attack then it fails as a concept and is no longer an axiom. To quote the paper, “A defeasible statement can be believed only in the absence of any evidence to the contrary”. One thing to keep in mind is law is not a statement here as there should be no argument to nullify a law, or at least not in this problem. Interpretations of law/constitutions can be statements but not the laws/constitutions themselves.

Another important thing to know is that arguments are not directly connected to nonmonotonic logic too since you need to decide which side won the argument while in nonmonotic arguments the result can be inconclusive!

Develop a theory of argumentation and acceptability of arguments

Argument Framework

The definition of an argument framework, AF, is given as follows

So basically, we have a matrix on which arguments attack which other arguments.

For the argument between I and A, given

I: “I cannot negotiate with A because they don’t even recognize I”

A: “I don’t recognize the A either”

I: “But A is a terrorist organization”

Given the arguments as (i₁, a, i₂), we have

Where given (x, y) we can say x attacks y.

Now, given this structure, how do we decide what arguments are good? For example, in this case it’s clear I wins but what if we go to 100s and 1000s of arguments?

The first idea this paper had was to find the groups of arguments, in particular, arguments that all agree with each other! These are called conflict-free sets.

Then, I think we are starting to develop a vague idea on what this framework will output. We will, given a bunch of arguments, get possible conclusions for a given legal case. For this, we will, most likely get

The arguments on the side of the defense and the conclusion
The arguments on the side of the persecutors and the conclusion

However, what is still missing here? We are missing how consistent these arguments are! In particular, let’s say on the lawyer’s arguments, while there are plenty of them, the prosecution dismantled all of them. In addition, let’s say the lawyer had no counter-arguments to any of the prosecutor’s arguments.

Then, even if the conflict-free set is the same size, the lawyer side should be losing. Here is where the idea of “accepting” arguments comes in.

So, we end up choosing all the non-refuted arguments from both sides! So if we were to look at the admissable sets we should be able to get the main unrefuted arguments from both sides.

However, how do we find these massive sets out of the sea of arguments? Here is where the extensions come in. The first main extension that is introduced in most papers is the “preferred extension”

this is the “maximal” set but can you see how there can be multiple preferred extensions in an argument?

Here, let’s say the arrows say attack then the 2 preferred extensions are

(A1, A2, A4, A5)
(A1, A2, A3, A6, A7)

Why can we say both are maximum? Because we can’t compare them! This is called partial order by inclusion.

Now, what does this mean?

Nixon’s Diamond

Nixon was a republican quaker. So we have 2 arguments

A: “Nixon is anti-pacifist since he is a republican”,

B: “Nixon is a pacifist since he is a quaker”

Then we have attacks = {(A, B), (B, A)}

Then what is the preferred extension? We have 2. One {A} and the other is {B} since both of the “largest” sizes have one element! This is called the credulus approach. In that, you are more than happy to give both sides of the story a shot.

So what the preferred extension does is it lets us know the credulous arguments.

In contrast, in a skeptical approach, only the parts where both arguments agree are given. For this, we mainly want the intersection of all preferred extensions(if finite)! To do this, we make a function

So this gives all the arguments which are acceptable with respect to our set of arguments. This means that for any argument that attacks A, we attack back to defend it which makes A defended and thus accepted.

The extension that using this, gets us the skeptical argument is

Now what is a fixed point? The fixed point, in my understanding, is you start with an input of nothing to F and then keep putting that output back in F until we reach a “fixed point” where our set stops growing!

Now, intuitively why will this give our skeptical set? So, initially, we put in our empty set. So then only arguments that have not been attacked will be outputted because our empty set cannot defend any arguments. Now, if these arguments that are not attacked attack other arguments A’s which in turn defends some arguments B’. Then, B’ is added to our set. However, it’s important to note that B’, in the current arguments we have, is always true! If they are fully defended by the arguments we have(which were never attacked) then the conclusion of B’ is true regardless of “side”

The other extensions are:

Now, all stable extensions are preferred extensions but not necessarily the other way around when arguments attack themselves. However, when the preferred extension is not the stable extension that argument framework is not called “coherent” and is assumed to have anomaly arguments.

Here, the below 3 theorems explain this best

Now, let’s look at this example again

So for our preferred extensions,

(A1, A2, A4, A5)
(A1, A2, A3, A6, A7)

are both complete extensions too since A4 is acceptable wrt (A1, A2, A5) so it’s in the group! However, we also have

3. (A1, A2)

Here, for every new argument that this can defend, it’s already in the group. So this is also a complete extension! So complete extension gives both the grounded extension and the preferred extension. And also as the theorem says, the 1 intersection 2 is 3!

Now, here I’d like to point out the 2nd fundamental reason why law is hard

The second fundamental reason why law is hard

This was pointed out by singh. Thanks again! But here, once we have the 2 sides, the prosecution, and defense in 2 argument structures and we have to decide on a winner, we need a judge. In that, we need to have a person come up with some arguments(given in their “opinion”) on which side won which attacks the other extension. This is a fundamental issue because currently, we have no way of resolving this without bias.

But now back to the paper.

For us to be able to have fun arguments, we don’t want arguments to go on forever! This is called well-founded and is defined as so

There are still a lot of very interesting parts of this paper that I skipped because I don’t understand them quite yet. But I will update this blog when I get the chance!

Now, we did get an understanding of the foundational paper on argument frameworks and how we can “accept” arguments. Now, how can this be implemented and what is the actual speed of this? For this discussion, we will look at “An Answer Set Programming Approach to Argumentative Reasoning in the ASPIC+ Framework”.

An Answer Set Programming Approach to Argumentative Reasoning in the ASPIC+ Framework

For this paper, a concept called Answer Set Programming was connected to the above argument framework approach for getting the time complexity of this argument framework and doing timing tests!

What is Answer Set Programming?

The best resource I could find for this is the amazing Wikipedia. The idea of this programming is very simple.

Given a boolean set of arguments in the body, we output a boolean output, the head.

<head> :- <body> .

in addition, several constraints can be set to prevent certain inputs from being true when another is false. Are you starting to see how this connects to our arguments?

This technique is not just used for arguments but also for other problems like coloring the vertices of a graph so that no two adjacent vertices get the same color or finding the largest set of pairwise adjacent vertices.

And I think you are noticing a theme. These problems tend to be pretty hard problems in computer science. Do correct me if I’m wrong but both of these seem to be NP-complete problems where the solution is verifiable in polynomial time but for finding the solution, the best method is close to trial and error. And yes, it turns out that for arguments, it’s the same.

Now, here’s the third fundamental issue with why law is hard which I mainly focused on in this article.

The third fundamental reason why law is hard

By exponential we mean that every time we introduce a new argument, we need to completely recalculate everything from scratch. In that, if we calculate and find the clusters for 600 arguments, when we add 1 argument, it’s roughly equivalent to just recomputing from scratch! Currently, there might be some improvement but that is the general idea of what NP-complete means. So essentially, below we see that resolving 5000 arguments can be done in a relatively short time but just adding a few arguments here just needs insane amounts of time. This is, I argue, the third fundamental problem with law because it makes resolving arguments extremely expensive.

However, if this is resolved, one possible scenario is for all the cases and arguments to be put into clusters and judges to collectively decide on which cluster is correct for all historical cases which will be the best possible outcome in this research.

But before that, how do we connect this answer set programming with our argument framework?

Connecting ASP and AF

One of my favorite parts of what the authors did or their prior works did was first, they separated axioms from premises. In that, they separated what is our hypothesis from what always holds. I think this is very valuable for say in the legal system where we don’t want to argue say against laws, in a typical case, but we want to argue with everything else. Formally,

Another interesting part was the rules to be hypothesis or strict. Which the authors called defeasible or strict. In that, the output from defeasible rules is a hypothesis and the output from strict rules is always correct.

This will be the exact same rule that was mentioned above!

Finally, the authors did not use arguments directly but had a more layered structure where the conclusion from the arguments is the statements and the arguments are based on sub-arguments with their own conclusion like a tree! Overall this translates our argument graph

Here, it’s a bit hard to parse but A3 has conclusion b and A6 has conclusion x and has the sub-argument A3. And A7 has conclusion z and has the sub-argument A6 and so on! The arrows are the attacks, the dotted line is the ordinary premise, the square lines is the axioms, and the dotted line in between boxes are the defeasible rules and the straight line is the strict rules.

Now, given all this, the authors made the code as follows for ASP.

for in and out the idea is

Thanks to this youtube vid. So in a way, it’s similar to the vertex covering.
I think if I get more intuition for the code I will write more here. For now, let us look at the timing tests!

Timing Tests Result

The percentage is the proportion of the axioms so interestingly, it seems like the growth is a bit exponential. So, overall though, for 5000 atoms, just 100 seconds or so we can find these argument clusters. I am very curious if we can compress arguments to stay within our limit and work with that!

But you might be curious, for the legal domain, do we always have to care about contradictions and so on? Can’t we just have “a current interpretation of the law” based on previous cases and just apply it? The answer is in some fields of law yes!

HYPO

This was an expert system released in 1987 for deciding on laws on trade secrets! For this particular legal field, the decisions are very case-based and so while this can’t handle the full argumentive logical cases, this can be good enough and is regarded as a classic legal AI approach. However, one issue is that to justify why it reached the decision, Hypo can only give the example of the old case and not the reasoning specific to the current case.

When I was checking “HYPO’s legacy: introduction to the virtual special issue”, it seemed as though HYPO evolved into more of an ASPIC-like framework as time went on which does make sense as this feels very similar to just putting the previous case atoms into axioms and just using that.

However, one issue here is if we were to just put all previous cases into axioms, although they may be strict and thus are comparably faster, it’ll be a huge number of axioms as we need to encode our entire legislative history and all the arguments.

To answer how to make this practical, “Induction of Defeasible Logic Theories in the Legal Domain”, which was interested in finding the minimum number of arguments to conclude, came up with the idea of

Come up with a conclusion
Greedily select rules from the facts to reach that conclusion

Now the method of selecting the rule is finding the best rule to apply that is common across all the cases in the dataset which gives a better measure of objectivity.

I am not sure if this is valid since I think this can lead to intermediate sets having contradictions but if we do this on a complete extension within each section, I think this can be an interesting direction.

Overall, there seems to be a tradeoff between allowing contradictions and speeding up performance.

Now, what about the data?

Data Issue

So far I have only checked the USA court data but, while a lot of the data is online in a service called PACER, a lot of it is behind a paywall

In particular,

So essentially, even accessing one case fully, I think you will expect at least paying 5 dollars for this. The reason it’s like this is Congress declined funding this project so they are relying on user fees. Though I’m honestly impressed it costs this much. However, one solution I found was a non-profit called Free Law Project whose main goal is to make the law free for everyone. For example, for the Mapp v Ohio legal case I mentioned above, I got the judge’s ruling reason, called the opinion, from here.

The method they use to get this data is

Have users download their extension
When the users access PACER, that data is sent to a website called Court Listener and hosted there

However, there are still issues. In particular, even for a big case like Mapp v Ohio, I don’t have access to the main court document. Just the judge’s ruling decisions/opinions.

Potential Low Hanging Fruit

The CourtListeners have a large collection of Oral Arguments where the judge argues with the defendants and prosecution to refresh the main points. I listened to a few and if transcribed, it may be an approximation for the main document although it’s way shorter.

However, in open source, there is a 256 GB dataset on law called pile of law. Where is this data from?

Pile of Law

Pile of Law was published by Stanford in around Nov 2022 in the paper “Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset”. One part I found interesting about this paper was that the main focus seemed to be to filter out offensive/toxic content from datasets in general which does seem to be a focus of Stanford University for example when they identified CSAM in Laion 5b. While the approach they used for this was interesting, for this blog, I’ll focus on the data sources.

The authors also seem to use court listener(until 2018) as well as some interesting data I didn’t know existed. The authors scraped 6 main categories of data

Legal Case Opinions and Filings.

This is where the court listener data comes of opinions(the judge’s explanation of the ruling) and the main legal document called dockets.

There is also data here on Veteran’s appeal decisions or FTC opinions that companies request from the FTC to see if they will get sued.

2. Legal analysis

This data includes official legal counsel for the president on what is good actions and not good actions and the authors say are similar to opinions and reports by an independent overseer of each justice department.

3. Laws

Essentially the authors just scraped constitutions and laws. I’m not sure if they were able to scrape all the laws in the United States ever had.

4. Contracts

Basically credit cards and business contracts

5. Conversations

US Congress hearings. This is possibly not relevant to our specific problem since Congress is responsible for making laws while we are mainly concerned with executing the laws. But this might give more idea behind the intent of laws.

Also, interestingly there is a “U.S. Supreme Court Oral Argument Transcripts” where the judges get the main argument to reorganize the case which I think is highly valuable.

Also, interestingly enough, apparently reddit’s r/legaladvice, r/legaladviceofftopic is considered a good data source which I found pretty funny.

6. Study materials

This is just the bar exam outlines and also open source case books which, especially the latter, sounds very interesting as commentary is added to each case for expert analysis.

Now, all these are labeled here. Now, while this is highly valuable, I think the main limitations of the PACER to CourtListener transition still exist in that the main document of the court docket is significantly more expensive and thus I’m guessing is not sufficiently present in this data source.

In addition to moving this to the ASP framework, another vital part that is missing, which may be addable with post-processing is the law at each point in time for all these documents. Especially since the law is constantly changing and legislative decisions change the interpretation of law nationwide.

However, how is this dataset used for AI currently? For Large Language Models in law, at least when reading “Large Language Models in Law: A Survey”, the main country that seems interested in implementing this seems to be China with models such as

LawGPT_zh
LexiLaw
Lawyer LLaMA
HanFei
ChatLaw
Lychee
WisdomInterrogatory
JurisLMs
Fuzi.mingcha

All trying to work with Chinese law more accessible with LLMs where in the paper “The Smart Court — A New Pathway to Justice in China?” it seems like China is going all in for doing automated justices that “promoted easier access to justice, enabled faster dispute resolution, saved costs by moving the judicial process online and ensured that judgments can be enforced.”

So the main player for law in AI seems to be China, not the United States.

However, in the context of the Pile of Law I wanted to mention a project by Carper AI called “Legal Data Reward Modeling for RLAIF”

Legal Data Reward Modeling for RLAIF

I just wanted to mention this project since it has a slightly different approach than the LLM training on the law I have seen so far. Overall, at least for the Chinese LLMs above, as far as I’ve seen, most of the advances seem to be the typical

Pretrain
Make an instruct/Use an instruct dataset in that domain for Supervised fine-tuning

Combo and not many features that seem specific to the law field. Do correct me if I’m wrong!

However, in Carper AI’s case, under johnjnay, they seem to do have a rather interesting approach probably from the RLAIF paper!

Make each action in the dataset a (state, action, reward) pair using say GPT 4 or some LLM
Supervise fintune. While this seems similar to the above, one key difference is that the output is the legal validity of the current action given by the state!

So now they can do reinforcement learning with AI feedback where the model can figure out the most “legal” action which I thought was pretty clever.

Conclusion

This blog mainly acts as a literature review/explanation of AI in law. Currently, the main challenges of Law in AI seem to be

At least in the US, the prohibitory access to court cases in data
The computational complexity of the ASPIC framework for argumentation is NP-Complete
Lack of consideration of laws changing with time.
Lack of connection between logic frameworks and LLMs

Currently, my guess is we need to offload some of the reasoning done in ASPIC to the LLM or an AI for a “cheaper approximation” of argument conclusions and reasoning. I don’t think any paper has done this yet but for general use of just making contracts and finding evidence, this is already solved. For replacing judges, even if the ASPIC+ framework is a polynomial time algorithm, I don’t think it’s necessary if we can’t conclude.