Alex and Evelyn sit down with Riana Pfefferkorn and David Thiel of the Stanford Internet Observatory to talk about the legal and technical issues coming to a head with the explosion of computer-generated child sexual abuse material.
Stanford’s Evelyn Douek and Alex Stamos talk to Riana Pfefferkorn and David Thiel of the Stanford Internet Observatory about the technical and legal challenges of addressing computer-generated child sexual abuse material. They mention:
Moderated Content is produced in partnership by Stanford Law School and the Cyber Policy Center. Special thanks to John Perrino for research and editorial assistance.
Like what you heard? Don’t forget to subscribe and share the podcast with friends!
Evelyn Douek: Generally on the idea of virtually generated child pornography from 2002, I think that so much has changed in terms of the problems that we're talking about, that this is going to spark a whole bunch of new lawsuits and legal challenges and First Amendment issues, especially when it's like hyper-realistic and does depict an actual person.
Alex: I can't wait for the Supreme Court to argue about what the founders thought about linear algebra matrices with 75 billion parameters in it. What did Thomas Jefferson believe when he [inaudible 00:00:32]-
Evelyn Douek: Yeah, I discovered notes in his library. We found the sheet of paper. It answers all the questions. So yes. No, exactly. I mean, I think that the law isn't well-equipped to deal with a lot of these issues.
Hello and welcome to Moderated Content's stochastically released, slightly random and not at all comprehensive news update from the world of trust and safety with myself, Evelyn Douek, and Alex Stamos. Today we want to do somewhat of a follow-up conversation to last week's conversation about the congressional hearing, looking into child sexual exploitation online, and talk a bit about the legal frameworks here, the constitutional frameworks and the technological issues that are at play.
And to do so, we're joined by two of our own. We are joined by Riana Riana Pfefferkorn, who got name dropped a bunch last week, so it's great to have her on the show. She's a research scholar at the Stanford Internet Observatory and recently just released a paper which we'll link to in the show notes titled, Addressing Computer Generated Child Sexual Abuse Imagery, Legal Framework and Policy Implications. So thanks very much for joining us, Riana.
Riana: Thank you so much for having me here.
Evelyn Douek: And we also have David Thiel, also name dropped last week. He's SIO's chief technologist and going to talk to us about some of the technological background and capacity here. So thanks for joining us, David.
David: Thanks for having me.
Evelyn Douek: Great. Okay, so let's start with a background question about why is this issue coming to a head now? What are the technological advances that are bringing the harms to the forefront and the constitutional issues to a head in this particular moment? What's changed in the last few years?
David: Historically, the ecosystem for detecting and mitigating child sexual abuse material has been focused on this recirculation of existing content. There's a slow, new content is produced over time, but it's largely this issue where there's fairly simple technology to do fuzzy match detection on known instances of CSAM. That's what most online service providers do to help prevent it being published and distributed on their platforms. What's happened in the last couple of years is that with the advent of generative machine learning, there have been models released into the public domain that are now easy to modify and retrain, and were trained on a significant amount of explicit material as well as imagery of children. And in some cases, as we published in a paper recently, those two things also overlap in some of the training data that these models have been trained on.
So it used to be that significantly modifying a generative model would require thousands of dollars at minimum, maybe tens of thousands of dollars to make significant changes. Now we're at the point where even if you don't have a particularly fancy GPU, you can still generate explicit imagery on your local computer where you don't have to worry about these online gatekeepers trying to filter your prompts or having output filters, things like that. And what's happened as a result is that we've seen this kind of explosion of computer generated imagery, some of which is attempts to be photorealistic, and there's kind of this above ground community that produces explicit adult content and tweaks models, but that has been repurposed by people that have interest in producing CSAM and now there is a large volume of that content getting produced and distributed.
Alex: One of the things you found, David, that I think we should reinforce for folks is, so these are, just to name them, we're talking about Stable Diffusion 1.5 mostly, right? Is that the model that is still the most popular?
David: Yes, although Stable Diffusion XL is becoming increasingly popular among people that are producing explicit content.
Alex: So you take Stable Diffusion 1.5 or Stable Diffusion XL, it gets retrained on adult pornography, one would assume for adult purposes. But even then you see that some of these models that are outputted, that the people who are producing them then say things like, "Oh, don't forget to put minus child in because you'll accidentally create CSAM." So there are obviously situations where people are creating models that are specialized, but even the non-specialized models seem to be useful for the creation of this content in some cases.
David: Right. And so we're now at a place where things have very much changed. We can't just rely on this fuzzy match detection of known instances because every day there's just thousands of new instances being churned out, which produces a lot of technical challenges for platforms. But we've also seen that people are very fuzzy about the legal implications as well.
Alex: Right. And I think this is a good thing to reinforce because we've kind of inverted the pyramid here of the CSAM ecosystem. When you and I were going to Crimes Against Children Conference even seven, eight years ago, David, there was still discussion of a huge amount of the content and stuff that was even created in the '70s and '80s that was converted from film, that people scanned their negatives, that while there was continuous abuse of children and terrible things happening, the number of people who were creating content with real children was much smaller than the number of people consuming it. And now the skillset has trickled down to the point that a huge chunk of the people who consume this content also have the skillset to create it. Which in their case is, one, that they can create content that fits their specific purposes. But also has the effect of possibly not getting caught because the entire ecosystem is based upon the idea that 99% of the content is recirculating and every year you're only adding a little bit of new stuff.
Evelyn Douek: Right. And just before we get to the legal frameworks as well, just to sort of be clear about the technological developments and advancements that we're talking about. So part of it is it's become much easier and more accessible for a wider group of people to be able to use this technology and produce these images. But I guess one of the critical questions here is going to be how realistic are we talking at this point and how indistinguishable is it from the real thing? So in terms of, is this the kind of thing where the vast majority of this now, just from looking at it, you can or can't tell if it's virtually generated?
David: From what we have heard from child safety groups and from law enforcement, the percentage which they really can't distinguish is still fairly low. But those are people that are primed to be examining it to try and distinguish it because they have an immediate need. They need to know if there is a child being exploited that they need to actually take action on.
For people that are not primed on whether they are looking for generated content or not, what you can see with legal explicit content that's been generated is it actually does fool a fair number of people. You'll get people posting images on Reddit and be like, does anybody know who this model is I want to follow their OnlyFans. And everybody has to jump in and be like, that's not a person. So we know that technologically the ability to have your average person think that this is a real person is there.
When it comes down to we really can't distinguish by any mechanism, technological or otherwise, it's a smaller but growing percentage.
Evelyn Douek: Right. Okay. So these are new aspects of the problem, but as Alex was saying, this isn't a new problem entirely. This is something that has been around for a long time in various forms and indeed is something that the courts and the Supreme Court have addressed before actually decades ago.
So Riana, why don't you start by walking us through the rough constitutional framework here about why or why not this material can or cannot be criminalized?
Riana: Sure. So it's probably familiar to you, Evelyn, that the First Amendment generally doesn't prohibit that much speech. There are a few categories that are totally unprotected by the First Amendment, and one of those categories is material that is obscene as measured by the three part Miller test. And separately, perhaps counterintuitively, CSAM is unprotected for the separate reason that it is speech that is integral to criminal conduct, which is to say the sexual abuse of children. So you have two different rationales for prohibiting potentially some or most of the material that we're talking about today in... Go ahead.
Evelyn Douek: The listener's going to go obscene, how can you tell me that this isn't obscene material? If you think about the common meaning of obscenity and what is obscene, we're talking about virtually created but hyper-realistic images of children engaging in sexual conduct. How is it possible that that is not obscene?
Riana: Right. And often the courts have said that a lot of the times something that is, usually this is in the context of actual photographic CSAM, a lot of CSAM is going to be obscene because the three part test for obscenity under Miller versus California, which came out 50 years ago, is that the work has to appeal to the [inaudible 00:10:31] interest taken as a whole, as judged by the average person applying contemporary community standards. It has to depict or describe sexual conduct in a patently offensive way, that's going to cover a lot of sexual material involving children. And taken as a whole it has to lack serious literary, artistic, political or scientific value, which may often also be the case.
So there will be a lot of overlap, not perfect overlap between material that depicts sexual abuse of children, whether it is actual hands-on real person abuse or whether it is computer generated and something being obscene. And so the, I think, low hanging fruit for trying to prohibit computer generated CSAM as the paper discusses would be to try and prohibit under a rather little used statute on the federal books that prohibits child obscenity. However, for some amounts of material it might not be obscene. So we have seen appeals court cases saying, for example, that a computer-generated image depicting older looking teenagers involved in what might be consensual conduct would not be obscene. Or to take another more theoretical example, something might not be lacking in serious literary artistic, political or scientific value if it is a journalistic image depicting a soldier sexually abusing a child in a war zone, for example.
So there is not perfect overlap between those categories and we have to worry more about the First Amendment's contours where we're talking about that sliver that is potentially falling in between those cracks.
Evelyn Douek: Yeah. So tell us about that category then. Because one of the things that I found super interesting in your paper is how little used the obscenity provision was. And I guess that's in part because there is this other provision available. So what's the other part of the legal framework here?
Riana: So the federal law on the books prohibiting child pornography, to use the terminology that is still used in the law, applies to either material that is actually produced using real children, material that is a morphed image that takes a non-sexual image of a actual identifiable child and morphs it onto an adult porn actor's body or another image of CSAM of actual children. And then this third category of, using law of the books right now, material that is indistinguishable from actual child abuse imagery, which is to say the sort of photo realistic imagery that we've been talking about.
That's where I think Congress is straying into constitutionally dubious waters when they enacted that language in 2003, because what they were responding to was Supreme Court decision from the previous year that had struck down the previous definition under that prong of the definition of CSAM as computer-generated images that are or appear to be of a minor engaged in sexually explicit conduct. And the Supreme Court said you can't constitutionally prohibit a computer-generated image that appears to be of a child, but does not involve the actual abuse of any real children. What the Supreme Court called virtual child pornography is constitutionally protected.
And so that is something that's going to present, I think a conundrum for prosecutors. For platforms who are required to report CSAM under federal law, how are you supposed to know whether something is an actual image or not? As David said, there are at least some small perhaps growing number of images where those do look photorealistic, even if they were actually produced using generative AI.
Alex: What kind of rights do individuals have here in the civil court system? Because there's lots of speech that the government can't prohibit, but people could be held civilly liable for. We're right on the tail end of this massive amount of deep fake pornography about Taylor Swift. So as an adult, she has some rights. Is there a difference between the rights between her and a child and has this been tested at all?
Riana: So the civil remedies aren't something that I have studied in connection with this paper, but there is a law on the book, Section 2255, that allows somebody who had been harmed while they were a minor by being depicted in CSAM to sue whoever it was that created that harm for them. I think it'll be interesting to see whether that expands to generative AI context. In there we're talking about how are you supposed to even find the person who created this image to begin with. A lot of the time from what we're seeing in news stories, we see story after story of people who are high school students creating pornographic deepfakes of every girl in their class and circulating it around all the other boys in class. Okay, there you can say, this is the person who made this and this is the person who's victimized and has the right to sue.
For the most part though, historically it's been mostly criminal enforcement by the Department of Justice under the criminal provisions prohibiting child pornography that have largely been used to bring offenders to justice. So that is what I would expect that we will continue to see sustained interest, I would hope, in this area now that we're talking about generative AI created images. With that said, it's been kind of an open question where has the DOJ been with regard to news story after news story? Especially if there isn't state law that adequately covers creating non-consensual deep fake pornography of teenage girls, but there is federal law that says that is very clearly illegal. Why is it that that Child Exploitation and Obscenity Section of the Department of Justice hasn't been stepping in so far as we know to date to deal with these incidents that have been cropping up all over the country?
Evelyn Douek: And I guess I want to pause on this because we were talking about this earlier, these news story after news story, and often they will say there's nothing that can be done, there's no law criminalizing this kind of material. And in fact, we just talked about this law that criminalizes computer generated imagery that is distinguishable of a minor engaging in sexually explicit conduct. But it is a kind of weird quirk. I sort of understand why people are surprised to find that this law exists when the Supreme Court did strike down a prior provision with very similar wording, as saying that it offended the First Amendment. The chutzpah to Congress just reassessing this law, I guess. But I'm curious if you have any insight from writing the paper, why did they think that this one might survive constitutional scrutiny and also why hasn't it been challenged, do you think, in the years since it was passed?
Riana: Yeah, so the wording that the Congress used in revising the statutory language is actually mimicking language that Justice O'Connor used when she was writing separately to Concur in the outcome in the Ashcroft v. Free Speech Coalition case that had struck down, that appears to be language from the 1996 law. So I think they were thinking in Congress, well, if we use the language that Justice O'Connor was using, that probably means if this gets challenged too, we will have at least one amenable justice on the court who will go along with this.
That said, at the time there were members of the Senate who were saying, "Look, this is just as unconstitutional as the previous language was. Now that the Supreme Court has weighed in, why are we passing this very, very similar language about material that's indistinguishable from an image of abuse of an actual child?" That said, if you look at the congressional findings in the 2003 Act, Congress was really mad at the Supreme Court for striking down that provision of the law because they were very concerned, remember this is 20 years ago, that computer generated imagery would make it impossible to tell real material from virtual material. And so I think they basically just said, "Did we stutter and we're going to try again here and try and fit our revised language within a box that we think would go over better with the justices."
And then in the intervening 20 years, we haven't seen a constitutional challenge to that indistinguishable from language, I think because it just hasn't really been necessary. Up until relatively recently, there wasn't technology that existed or at least not cheaply to generate really truly photorealistic looking material. It was generally going to be fairly readily distinguishable that something was not a real image. And for years and years, defendants have tried the Hail Mary past of saying, well, this doesn't depict a real child. You have to prove that it does. And that hasn't worked out because mostly all they do is say, "Well, Photoshop exists, therefore this image probably isn't real." And that hasn't been enough to persuade juries. Prosecutors have still been able to carry their beyond a reasonable doubt burden and get convictions. And a lot of times these cases plead out anyway when they're involving known real material.
Alex: I would love to see what a congressional staffer does with a acronym for the Did We Stutter Act? That would be pretty impressive.
Riana: I just was proposing this on Bluesky the other day, but it was regard to first-sale doctrine rather than CSAM. So stay tuned for that bill to come down soon.
But yeah, I mean this is where we might finally have a situation where the technology has caught up with Congress's fears, and I'm kind of wondering whether we will see prosecutors try to start relying on that indistinguishable from provision or whether instead they will look for other means of addressing material and bringing offenders to justice without potentially risking that revised provision of the law getting struck down too. And I think there are a few different options that prosecutors would have in this space. One of them is the child obscenity statute, which hasn't been used very much. In part because it requires prosecutors to satisfy that three pronged Miller test, which requires more work and putting that question to the jury to decide whether all three prongs have been satisfied. Whereas possessing actual photographic CSAM is practically a strict liability offense. And so the amount of work required is just way higher, I think, for a child obscenity prosecution than for using the workhorse child pornography statute to go after people for possessing real material.
The possession of real material I think is also going to be a way that prosecutors will be able to go after people, not for possessing what is believed to be or even provably is AI generated material, but rather going after people for possessing the real material that they're likely to have as well. So you can imagine that there may be an investigation that gets kicked off by what is believed to be or even confirmed to be an AI generated image. But that once you examine somebody's devices, it turns out that they have terabytes and terabytes of confirmed known real abuse material of the type you were talking about, Alex, that may have been floating around for years and is well known to involve real children.
So I think that is something that we will start to see more and more. It's already the case, the paper cites a few cases where prosecutors found that somebody had a bunch of drawings, cartoons, computer generated images, and just charged them for the confirmed real photographic abuse imagery that they had in their possession as well.
David: If I remember correctly, the cases which were strictly on illustrated obscene material, the reason why that hasn't gotten thoroughly tested is because they're taking pleas for that as well. Where they're saying, "Hey, do you want two years? Do you want to try and challenge this and if you fail to get five?"
Riana: Yeah, that's a great point. I think there's still going to be a disincentive to potentially go to trial here. Even with real material, a lot of defendants don't really care to try their luck in front of a jury because juries don't like pedophiles very much, and so you often will see people pleading out. And for material that is obscene, as we were talking about earlier, a lot of overlap there between what is CSAM and what is obscenity. And so there will still be the ability to use obscenity doctrine. I have some concerns about how that might bring back regressive social norms, but it is going to be an option that will still be there. And, as David was mentioning, we might not necessarily see defendants with a lot of appetite for pressing their luck in terms of their First Amendment rights to create this kind of material.
Evelyn Douek: Yeah. Can we just unpack that just really quickly because I think that that's an important part of what you're saying? The regressive social norms. I think this is an important argument in the paper about one of the weaknesses of relying on the obscenity statute and the quirks of the Miller test. What do you mean there? What's the weakness? What's your concern there?
Riana: Sure. So last fall, my friend Kendra Albert at Harvard, who I'm sure you must know from your time there, Evelyn, published a terrific paper about the community standards requirement of the Miller test for whether something is obscene. And in that paper, Kendra pointed out that well into the 1990s, there were cases where juries left to decide whether a particular image was obscene or not we're coming out on the side of saying it was obscene where it involved homosexual conduct, where it probably would not have been considered obscene if it evolved heterosexual conduct. And right now just the mere existence of queer and trans people is under attack. There are laws in dozens of states that have cropped up like mushrooms just in the last three years to punish and be prejudiced against and discriminate against trans people, including trans children. We're in a time where people are being maligned as child groomers just for existing.
And so using the community standards of a particular community, you can imagine that if juries are left up to determine whether under the child obscenity statute a given image is obscene or not, it should not be the case that whether somebody goes to prison or not is determined by whether or not the image depicts heterosexual versus homosexual conduct or cisgender versus transgender bodies. I don't think that would be a net win for society. And so I have a lot of fears about the potential use in different communities of the child obscenity statute that has long kind of withered on the vine in terms of its utility in the pre generative AI age.
In general, we are in a moment where American society has just lost its God damn mind when it comes to anything to do with children, and that is where I think the need for the First Amendment is strongest. Because otherwise you go after art, you go after sexual health information, you go after anything that deviates from an extremely narrowly defined norm. And so I think if anything, having a robust free speech doctrine is as important as it's ever been in the current moment where just about anything can be justified under the banner of child safety.
Evelyn Douek: Right. I think it's important to talk maybe a little bit about the different kinds of harm that we're concerned with. And one kind of harm, I think as Alex was sort of hinting at before, is when the virtually generated imagery depicts a real person and depicts them in a situation that they've never been in, and that there are certain dignitary and privacy harms that result from that. Whether it be a child where it might be especially egregious or whether it be Taylor Swift where I think we can understand, or teenage girls at high school or many other predominantly women that find themselves in these situations. And that's one kind of harm. And it's pretty clear actually. I think that Ashcroft is not the case striking down the virtually generated CSAM predecessor statute was not concerned with that kind of harm. It was talking about situations where there was no real child involved, where it was depicting someone that did not exist. So that I guess is one question.
But then the other question, the reason why a lot of this material can be prescribed, child pornography can be prescribed by the law is because it is related to the harm of children in the actual abuse of children. And this can occur, I think, even when there is not an actual individual depicted. And so David, I'm hoping that you can talk a little bit about the training of the models that produce this material and how that can involve the harm of real children even if those victims don't end up being actually depicted in the output.
David: Well, so when it comes to the training process, I mean there's a few different harms. There's one thing where models can just conflate concepts. And so, as I mentioned, they've been trained on a decent amount of explicit adult content, they've been trained on images of children, those are multipurpose models, and it knows how to combine concepts, it has concepts of age that can be manipulated. And there are also some cases where when we did do that project looking at the material used to train the models, there were repeated matches of what had the same match ID. So there's some degree of reinforcement of individual pieces of CSAM, which theoretically could further bias the output of those models to produce things more closely resembling those repeated images.
And you don't need a large number of repetitions to teach a model a particular concept. So for example, when you try and make a modification, an augmentation for these models, and you wanted to make imagery of yourself, you can take like 10, 20, 30 photos of you and use that to train an augmentation that will actually come up with a decent likeness of you. And that's one of the advantages of these models is that you don't have to train them on very large amounts of data. So those are both things that contribute to biases in the output, the ability to combine those concepts and to resemble some portion of the input.
One thing I think, not so much on a technological level, but one thing that we've seen in a number of cases is that when you have these hosted services that are basically Stable Diffusion models with a little bit of prompt conditioning that you upload an image and it will undress this person for you or put them in explicit imagery, people's rejoinder is usually that, "Well, why should it bother you if it's not, 'real'?" And that's one of those in the child safety sphere in particular, people generally don't use the terms real CSAM to describe what might be a photograph. And part of the reason is because of that, it tries to dilute that harm. Basically getting you on a technicality like you shouldn't be harmed because this didn't actually happen in real life.
And the reason why that real not real distinction is disfavored is because when you have content that has been altered or has been generated, the psychological and social impacts on that person can be just as bad, and in some cases worse than if it were distribution of a photograph of something that had happened in real life. And that is something that in a number of places is still not very well addressed, particularly when we look at non-consensual distribution of imagery of adults. There are in some specific laws distinctions between why this was distributed, in what circumstances occurred, was it altered, that kind of thing.
Evelyn Douek: So thinking though about the situation where it's the output is not an identifiable person, but it's produced by a model that has been trained on real images, there's obviously still a harm there created by the fact that there is a demand for these real images and the reuse of those images. But I'm curious, Riana, if you have thoughts about the mens rea requirements that we might have here in terms of thinking about who can be liable for either using a model that's been trained on images of photographic CSAM or how to think about how far liability should extend when these victims are being harmed by this use. But as some of CIO's work and David's work showed, people don't always necessarily know what the models have been trained on.
Riana: Yeah, I think there's room for potentially criminalizing the possession of a model that has been trained on photographic abuse imagery. And this is one of the things the papers suggests as kind of a lacuna in existing law. But I think you would need to have very carefully tailored language and a knowledge mens rea standard because if we think back to when David published his research on the LAION-5B dataset in December, finding that there had been confirmed CSAM involving real children in the training data, everybody freaked out about possessing a copy of that or having used it to train something. And I don't think we really want it to be the law that everybody who downloaded a copy of that data set or produced images using LAION-5B without knowing or having any reason to know that there had been actual abuse imagery in that data set, that those people are guilty of a crime. We have due process standards in this country for a reason.
And so I think with regards to ML models, especially Congress has taken care in other similar laws, not to criminalize the possession and use of general purpose technology. And I think we would need to do something the same here to ensure that we are only focusing on knowing an intentional bad conduct rather than the LAION-5B type scenario. When it comes to something like non-consensual intimate imagery for adults where the rationale is very different than when it comes to children, I see the argument for a knowledge requirement there too, to be honest. I know there's different laws, David was talking about this before we started recording, that focus on whether there is harm versus whether there is an intent to commit harm. And again, where we have a big open internet that has a lot of images flowing around provenance unknown, it may not always be the case that somebody who possesses a revenge porn image knows that it is a revenge porn image. And I don't think we necessarily want to put a bunch of people in prison without having them meet some fairly strict knowledge requirements.
That said, that's not to downplay the harm that does happen to people, children and adults, from being depicted in this material, from having their actual images shared without their consent from being depicted in computer generated images. It's just to say that we already imprisoned more people than anywhere else on the planet, and I think we need to be very careful about passing more laws that add to that without being very careful about the mens rea.
Alex: Is any of this tied up in all of the different lawsuits around the copyright issues? It feels like this is just an overall problem that our legal system doesn't really know how to deal with the incredible effectively compression technology that's been invented with these models that you could take a petabyte of data and turn it into a terabyte of model weights and you could still recreate stuff that perfectly reflects that but it's partially new. Are all these different lawsuits from the Sarah Silvermans of the world and the Disneys and such, is that going to change the output here or are these totally different parts of law?
Riana: I'm curious what Evelyn thinks, but I think at least copyright stays in the copyright lane and the criminal law stays in the criminal law lane in so far as I don't think you can get a copyright on CSAM. I glanced at this a little bit earlier because I figured this sort of thing might come up. And so I don't think anybody is going to show up in court to assert their rights because you can't register a CSAM image with the Copyright Office and you need do a registered image in order to sue. So I don't know that that's going to be shifting the bounds there.
That said, I think Evelyn can speak to how content cartels love to see broad use of copyright as a means of effectuating content moderation outcomes and liability outcomes in other areas of the law beyond just copyright. And so when it comes to intimate imagery, for example, we've long seen people say the only way to get this taken down is to get the copyright in it or if it's your own image that you took as a selfie, you can file a DCMA claim and platforms will take it down under the DMCA, even though they might not necessarily take it down if it's just, just in air quotes there, your nude that has been spilled all over the internet. And so we might still see that because US law has traditionally very strongly protected the rights of copyright owners. I can see a world where generative AI creates another scenario where people come to rely on copyright law as they have here before with the DMCA for takedowns to try and assert their rights to get their images taken offline in the absence of a federal level and CII statute.
Evelyn Douek: Yeah, no, I think that's a great summary. I think we've seen the promises and limits, real limits of copyright law in other areas concerning intimate imagery as well. But I think the underlying point is a great one, which is that there's this whole new problem and the legal system doesn't really know how to deal with this at all. Even though we have these cases generally on the idea of virtually generated child pornography from 2002, I think that so much has changed in terms of the problems that we're talking about, that this is going to spark a whole bunch of new lawsuits and legal challenges and First Amendment issues, especially when it's hyper-realistic and does depict an actual person.
Alex: I can't wait for the Supreme Court to argue about what the founders thought about linear algebra matrices with 75 billion parameters in it. It's just clearly, what did Thomas Jefferson believe when he [inaudible 00:36:37]-
Evelyn Douek: Yeah, I discovered notes in his library. We found the sheet of paper, it answers all the questions. So yes. No, exactly. I mean, I think that the law isn't well-equipped to deal with a lot of these issues.
But we've been focusing on law enforcement response. And I think one other really important actor before we close out that we should talk about, and you mentioned this Riana, about content cartels, we should talk about the platforms and we should talk about what the private sector can do here or might be doing and might be counterproductive in this space and how they should be thinking about this new issue. So given that we're now in a world where there is this flood of virtually generated child sexual abuse material, and we have these congressional hearings, there's lots of concerns about child safety online, how should platforms be thinking about this? Are they doing anything and what should they be doing?
Riana: Well, I'll be curious to hear from David about what the generative AI companies are doing here. But in general, if you are an electronic service provider, you are required to report apparent violations of the child sexual exploitation and abuse laws when those occur on your platform. You have to report them to a reporting pipeline called the Cyber Tip Line that's run by the National Center for Missing and Exploited Children. And so the Cyber Tip Line receives 32, 33 million reports per year, almost all of them from online service providers. The bulk of those from meta in particular and other usual suspects, large entities.
And once this kind of material starts cropping up on the open web rather than just being traded in niche communities, it's going to get reported just like photographic abuse imagery is because there's very little incentive for platforms to, I think, put in the work to try and determine whether something is real or not because they face a lot of liability, including criminal fines if they guess wrong and decide not to report an image that turns out to be photographic when they thought maybe it was AI generated. Whereas on the flip side, they don't face liability in general if they report an image that is not actually CSA that is an image of an adult or is a drawing or is something that a pediatrician requests a parent to send in. And so there's going to be all of this AI generated material. It can be reported alongside photographic abuse material and platforms are basically going to leave it, I think, to NCMEC and to law enforcement, to whom NCMEC routes those reports, to figure out whether something is AI generated or an actual image of a real child.
Evelyn Douek: The good news is they're extremely well-resourced and have plenty of staff to handle this incoming flood of new reports. Is that right?
Riana: Well, I mean, Elon Musk is apparently adding 100 whole jobs in Austin to be equal to the ingenuity of these communities and of child safety offenders in general. We have a system, the photo DNA system and other systems for detecting known confirmed abuse imagery. And as David could probably explain, that's probably not going to go very well in the forthcoming generative AI context. The current reporting systems are very good at finding and reporting known images that have been added to a hash list of confirmed abuse imagery. They're not built to detect novel synthetic media at scale. And I don't know how much effort platforms are going to put in to trying to tell real from fake when they can basically pass the hot potato down the line and report it all and let somebody else deal with it.
David: Yeah, I mean, those systems aren't good at detecting novel material at scale period. But in this case, we're moving from this model where we've got those fingerprints of known instances of CSAM to where you're going to be dealing with a huge amount of novel material. And really the only way that we know to try and mitigate that is to also use machine learning. So you've got these systems burning trees to turn into CSAM, and then we burn more of them to use machine learning to try and detect whether this is, A, a synthetic [inaudible 00:40:44]-
Evelyn Douek: Good thing that doesn't cause any other problems.
David: Yeah. So really the platforms, all they can do is try and come up with models that can detect this, which is in and of itself problematic because how are you going to train those models? What is the source material of the things that are going to help detect this, and what is the legal status of even doing that to begin with?
So another issue is that doing this at scale, it's not just like a technological problem but also a bit of a structural problem, which is that the only people historically that have had access to photo DNA have been relatively small number of online platforms, but new ones can get it. When it comes to machine learning models, we've really got most of the big players rolling their own to help them detect this stuff that's not open. And arguably having it be openly distributed would cause its own problems. But this does mean that when a small provider or a fediverse instance or something out there, they want to try and mitigate this problem, there's nothing really off the shelf that they can use. It's all inside of these large providers. And we've seen smaller scale sites basically be knocked offline by people posting large amounts of synthetic CSAM, because even if they went to photo DNA, it wouldn't recognize all of these new ones that were just produced today, and there's no model that they can use.
So you had sites like disabling images, literally disabling images because they had no off the shelf way to mitigate what was basically using CSAM as an attack.
Evelyn Douek: And just the other question to close out is to think about whether you have any thoughts about what model developers can be doing or whether there's any other... When we're talking about platforms, like often we're talking about the social media platforms, but there are other platforms here that are enabling this. And so is there a way that there can be more responsible work done at that level as well?
David: Yeah, so I mean, when we talk about platforms I usually think of it as three separate things. You've got who we're usually talking about, which is the Facebooks and Googles that are operating these distribution and publication channels. And then you've got the platforms that are, for example, hosted image generators, things like OpenAI, that kind of platform. And then lastly, you have the platforms that are distributors of these open source models. So that'll be the Civitai, et cetera of the world. And they all kind of have their own different responsibilities.
When it comes to the people that are actually training or hosting models that they've trained themselves. There are a number of things that they can do to change the training data to reweight things to ensure that no actually existing CSAM is in those data sets. They can change the way they train individual models by excluding explicit content or excluding imagery of children and not having both of those things be in the same model.
There's also some things that you can do to existing models to do what people have variously called concept ablation or concept erasure, where you might take a model that is very popular for producing legal and or illegal explicit content, and you can then retrain that model to say, hey, you don't know what a child or a cat or anything like that is, or this copyrighted character. You can actually turn that into a model that will refuse to generate that output in any significant way. So that's a rather new technology that people are just now starting to experiment with.
But from folks that we've talked to it does look like that is potentially promising a way in which models could be altered. And the question now is, okay, well can you alter them back after the fact? And we don't know the answer to that yet, but it'll be an interesting space to see.
Evelyn Douek: I was so close to saying it's nice to close out on a slight note of optimism, and then you took a sharp turn right at the end and say, but maybe not. But regardless, I mean, clearly these are issues at the forefront of technology. We're still working out what's possible and at the forefront of the law. I couldn't agree more that we're going to see a bunch of legal challenges testing the limits of these laws in the coming years and testing the limits of law enforcement. And so we will have to have you both back on as this all plays out. And we see how both technology and the law responds to these emerging issues. But thank you very much for joining us today.
And this has been your Moderated Content Weekly Update. The show is available in all the usual places, including Apple Podcasts and Spotify. Show notes and transcripts are available at lawstanford.edu/moderatedcontent. This episode wouldn't be possible without the research and editorial assistance of John Perrino at the Stanford Internet Observatory, and it is produced by the wonderful Brian Pelitier. Special thanks also to Justin Fu and Rob Huffman. Talk to you next week.