Inside the Hollywood writing that drives creative AI
This is Atlantic intelligence A newsletter where our writers help you think about artificial intelligence and the new machine age. Has anyone forwarded this newsletter to you? register here–
earlier this week atlantic ocean Published new investigation by Alex Reisner into data used without permission in training programs generative-AI In this case, companies like Apple, Anthropic, Meta, and Nvidia have collected dialogue from tens of thousands of movies and TV shows. To develop large-scale language models (or LLM).
The information has a strange origin: it is instead pulled from a script or book. Dialogue is taken from subtitle files extracted from DVDs, Blu-ray discs, and Internet streams. “Although this may seem like an odd AI training resource, captions are valuable because they are a written form of dialogue,” Reisner wrote. “They have the rhythm and style of a conversation. and help tech companies expand the reach of general AI beyond academic textbooks, journalism, and novels, all of which are used to train these programs.”
Perhaps it is no longer shocking that creative humans are being tricked into working to train machines that threaten to replace them. But evidence that clearly shows what data is used and for what purpose is difficult to come by. This is due to the secretive nature of these technology companies. “At least for now. We also know a little more about who was caught in the machine,” Reisner wrote. “The world will decide what they owe.”
There is no doubt that Hollywood writing is driving AI any more.
By Alex Reisner
As long as the chatbot generative-AI Still on the internet Hollywood writers are wondering if their work is being used to train them. The chatbot is very agile in movie references. and various companies It seems to train them from all available sources. One screenwriter told me recently that he had seen an AI that created a very close imitation. mogul and 1980s TV shows AlfBut he had no way to prove that the program had been trained on the material.
I can now confidently say that many AI systems have been trained to work with TV and film writers. Not just open mogul and AlfBut in more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of this is included in AI training datasets used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I just Download this dataset which I’ve seen referenced in various Large Language Model (or LLM) development papers, including writings from every film nominated for Best Picture from 1950 to 2016, at least 616 of them. The Simpsons170 episodes of Seinfeld45 episodes of Twin Peaksand every episode of wire– sopranoand Very bad– It also includes pre-written “live” dialogue from the Golden Globes and Oscars broadcasts. If a chatbot could imitate a crime gangster or an Alien sitcom, or, more simply, if it could put together all the shows that might require a writer’s room, Information like this is part of the reason why.
Read the full article
What to read next