Can generative AI be stopped by digital watermarking?

Can generative AI be stopped by digital watermarking?

In order to establish a framework for the development of generative artificial intelligence, the Biden White House recently passed its most recent executive order. This order included content authentication and the use of digital watermarks to denote the computer generation of federal government digital assets. In a time of generative AI misinformation, these and other copy protection technologies could aid content creators in more securely authenticating their online works.

A brief watermarking history

In Italy, analog watermarking methods were first created in 1282. Thin wires would be inserted into the paper mold by papermakers, resulting in almost imperceptibly thinner sheet areas that would show up when exposed to light. Analog watermarks were used to transmit secret, encoded messages in addition to identifying the location and method of a company’s product production. Government use of the technology to stop currency counterfeiting had increased by the 18th century. Around the same time, color watermark techniques were created, which sandwich dyed materials between paper layers.

Although the term “digital watermarking” was n’t created until 1992, the Muzac Corporation was the first to patent the underlying technology in 1954. The system they created, which they used up until the company’s 1980s sale, used a “notch filter” to store identification data by blocking the audio signal at 1 kHz during specific bursts, similar to Morse Code.

To track and comprehend what American households are watching, watermarking techniques have long been used by advertising monitoring and audience measurement companies like the Nielsen Company. Even the contemporary Blu-Ray standard ( the Cinavia system ) and government programs like authenticating driver’s licenses, national currencies, and other sensitive documents use these steganographic techniques. For instance, the Digimarc Corporation created a packaging watermark that allows any digital scanner in range of vision to read the product’s barcode by printing it almost imperceptibly all over the box. Additionally, it has been employed in applications like improved material recycling efficiency and brand anti-counterfeiting.

the present moment

The same principles underpin modern digital watermarking, which uses specialized encoding software to subtly embed new information onto a piece of content ( whether it be an image, video, or audio ). Machines can read these watermarks with ease, but humans can hardly see them. Watermarks serve as a record of where the content originated or who the copyright holder is, as opposed to existing cryptographic protections like product keys or software protection dongles, which actively prevent the unauthorized alteration or duplication of content.

However, the system is not flawless. The unreliable, unenforceable word of AI companies is the only thing that can prevent copyrighted works from being trained on [ by generative AI models], according to Dr. University of Chicago Neubauer Professor of Computer Science Ben Zhao informed Engadget via email.

See also  The seven-year Wi-Fi patent dispute between Caltech and Apple and Broadcom is over.

He claimed that there are n’t any cryptographic or regulatory safeguards in place to safeguard copyrighted works. ” Stability has turned opt-out lists into a mockery. To ignore everyone who registered to opt out of SD 3, they changed the model name to SDXL. You cannot prove that you were already trained into our model, so you cannot opt out, Facebook/Meta said in response to users on their most recent opt-out list. ”

The White House’s executive order, according to Zhao, is “ambitious and covers tremendous ground,” but its current plans lack many “technical details on how it would actually achieve the goals it set.” “”

There are many businesses, he adds, that are not subject to any legal or regulatory requirements to even consider watermarking their genAI output. In an adversarial environment where stakeholders are encouraged to disregard or avoid rules and oversight, voluntary measures are ineffective. “”

Whether they like it or not, businesses are built to generate revenue, so avoiding regulations is in their best interests.

Since an executive order lacks the constitutional standing of congressional legislation, it would also be very simple for the next presidential administration to take office and dismantle Biden’s Executive Order and all of the federal infrastructure that went into its implementation. However, do not anticipate any action from the House or Senate on the matter.

Anu Bradford, a law professor at Columbia University, stated to MIT Tech Review that Congress is extremely unlikely to pass any significant AI legislation in the near future due to its extreme polarization and dysfunction. The major players in the industry have typically used pinky swears as the only means of enforcement for these watermarking schemes so far.

The Process of Content Credentials

Industry alternatives are becoming more and more necessary as the government’s wheels continue to turn slowly. To safeguard the integrity of content regardless of the platform it is consumed on, Microsoft, the New York Times, CBC/Radio-Canada, and the BBC launched Project Origin in 2019. The Content Authenticity Initiative ( CAI), which approaches the problem from the creator’s perspective, was also introduced at the same time by Adobe and its partners. The Coalition for Content Provenance and Authenticity ( C2PA ) was eventually formed as a result of the collaboration of CAI and Project Origin. At its Max event in 2021, Adobe unveiled Content Credentials ( CR ) from this coalition of coalitions.

Every time an image is exported or downloaded as a cryptographically secure manifest, CR adds more details about it. Websites can compare the data from the image or video header to provenance claims made in the manifest by using the creator’s information, where it was taken, when, what device took it, and whether generative AI systems like DALL-E or Stable Diffusion were used. A special authentication method that cannot be easily stripped, unlike EXIF and metadata ( i), is produced when combined with watermarking technology. e. When uploaded to social media sites ( as a result of the cryptographic file signing ), the technical information is automatically added by the program or device that took the image. similar to blockchain technology

See also  Non-fiction authors filed a copyright lawsuit against OpenAI and Microsoft.

Because many online systems were n’t designed to support or read them and instead chose to ignore the data, Digimarc Chief Product Officer Ken Sickles told Engadget that metadata rarely survives common workflows as content is moved around the internet.

Tony Rodriguez, the chief technology officer at Digimarc, told Engadget that the analogy “we’ve used in the past is one of an envelope.” The valuable information you want to send is placed inside” and that’s where the watermark sits,” much like an envelope. It is actually a component of the media’s pixels, audio, and whatever it may be. The outside of the envelope is being written with metadata and all that other information. “”

The credentials can be reattached through Verify, which uses machine vision algorithms against an uploaded image to find matches in its repository, if someone is able to remove the watermark ( it turns out, it’s not that difficult, just screenshot the image and crop out the icon ). The credentials are reapplied if the uploaded image can be recognized. By clicking the CR icon, a user can access the full manifest, verify the information for themselves, and decide more carefully what online content to trust if they come across the image content in the wild.

Like a home security system that expands its coverage by pairing locks and deadbolts with cameras and motion sensors, Sickles envisions these authentication systems operating in coordinating layers. Sickles remarked,” That’s the beauty of watermarks and content credentials combined.” As a foundation for authenticity and comprehending providence around an image, they strengthen the system much more than they would on their own. The Content Credentials standard is being incorporated into Digimarc’s current Validate online copy protection platform, and the company freely distributes its watermark detection tool to generative AI developers.

The standard is already being used in real-world commercial products like the Leica M11-P, which will automatically give photos a CR credential as they are taken. Its use in journalistic endeavors has been investigated by the New York Times, Reuters, and Microsoft. It was also included in the Bing AI chatbot and the ambitious 76 Days feature. According to reports, Sony is implementing firmware updates for Alpha 1 and Alpha 7S III models that will be released in 2024 in order to incorporate the standard into its Alpha 9 III digital cameras. Adobe’s extensive collection of photo and video editing tools, including Illustrator, Adobe Express, Stock, and Behance, also includes CR. Firefly, the company’s own generative AI, will otherwise be opt-in but will automatically include non-personally identifiable information in a CR for some features, such as “generative fill” ( basically noting that the feature was used but not by whom ).

See also  ChatGPT revealed actual phone numbers and email addresses due to a" stupid" attack.

However, the front-end Content Credentials and the C2PA standard are still in the early stages of development and are currently very hard to locate on social media. The widespread adoption of these technologies and the locations where it is adopted, in my opinion, are what really matter; According to Sickles, both from the perspective of linking the content credentials with a watermark.

Nightshade: The database-dead CR substitute

Some security researchers have given up on waiting for written laws or industry standards to take hold in favor of taking control of copy protection. For use specifically against generative AIs, teams from the SAND Lab at the University of Chicago have created two incredibly nasty copy protection systems.

Glaze, a creator-friendly system created by Zhao and his team, uses the idea of adversarial examples to disrupt the mimicry of generativeAI. It has the ability to alter an artwork’s pixels in ways that are invisible to the human eye but look very different to a machine vision system. A generative AI system cannot accurately mimic the intended style of art when trained on these “glazed” images; instead, cubism turns cartoonish and abstract styles are transformed into anime. Particularly well-known and frequently imitated artists may benefit from this in protecting the commercial viability of their distinctive artistic styles.

SAND Lab’s most recent tool is vehemently punitive, in contrast to Glaze, which focuses on preventative measures to deflect the efforts of illegal data scrapers. The system, known as Nightshade, will subtly alter the pixels in a given image, but rather than confusing the models it is trained with like Glaze does, the poisoned image will corrupt the training database it was ingested into in its entirety, necessitating manual removal of each harmful image in order to fix the problem.

The tool cannot be used as an attack vector but is intended as a “last resort” for content creators. Zhao argued that this is equivalent to adding hot sauce to your lunch because someone keeps robbing it from the refrigerator.

Zhao does n’t feel much sympathy for the Nightshade-harmed model owners. He claimed that businesses that purposefully ignore opt-out lists and do-not-scrape directives are skilled at what they do. There is n’t any “accidental” data download or training. Taking someone else’s content, downloading it, and practicing on it requires a lot of effort and focus. “”

Similar Posts

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments