Skip to main content
The Daily Melbourne

Melbourne news, every day

News

Melbourne's digital archives are drowning in duplicate images — and the fix lags behind London and Amsterdam

Cultural institutions across the city are grappling with bloated image libraries riddled with near-identical files, a problem that comparable cities resolved years ago with dedicated deduplication programs.

By Melbourne News Desk · Published 5 July 2026, 5:00 am

4 min read

Melbourne's digital archives are drowning in duplicate images — and the fix lags behind London and Amsterdam
Photo: Photo by Daniel Dang on Pexels

Melbourne's major public cultural institutions hold millions of digitised images across their collections — and a growing share of that storage is consumed by duplicates. The State Library Victoria, the City of Melbourne's digital asset archive, and several university libraries have each acknowledged the problem in internal digitisation reviews, yet no unified city-wide deduplication framework exists. Sydney, London, and Amsterdam have all moved faster.

The issue has sharpened in 2026 because state and federal digitisation grants tied to the National Cultural Policy, released in late 2023, require institutions to report storage efficiency metrics for the first time. That reporting deadline falls in September 2026, and archivists say the scramble to audit duplicate image holdings has exposed just how deep the problem runs. At La Trobe Reading Room collections alone, early audit work has flagged that a significant proportion of high-resolution TIFF scan batches contain near-identical frames generated by multi-shot camera rigs — files that were retained rather than culled at the point of ingest.

What other cities figured out first

Amsterdam's Rijksmuseum resolved a comparable problem in 2021 when it deployed a perceptual hashing pipeline across its 900,000-item online collection, cutting duplicate or near-duplicate image variants by roughly a third within 18 months, according to the museum's own published technical documentation. The British Library in London embedded automated deduplication checkpoints directly into its digitisation workflow as part of a 2019 infrastructure overhaul, meaning duplicates are now flagged before they enter long-term storage rather than after.

Melbourne's institutions have largely taken the opposite approach — ingest everything first, clean up later. That strategy made sense when storage was cheap and digitisation was racing to beat physical deterioration of fragile holdings. Storage costs have risen sharply, however, and cloud migration projects at the State Library Victoria and at the University of Melbourne's Baillieu Library mean institutions are now paying ongoing per-terabyte fees rather than amortising the cost of on-premises drives. At current AWS S3 pricing, each unnecessary terabyte of duplicate image storage costs roughly $27 a month — a modest figure per file, but significant at archive scale across thousands of ingested batches.

The City of Melbourne's digital heritage team, based at the Melbourne Town Hall administration precinct on Swanston Street, has been trialling open-source deduplication tooling since March 2026. The pilot covers the city's photographic record of public art installations in Hosier Lane and the Federation Square precinct, two collections that were digitised repeatedly by different contractors over the past decade and now contain substantial overlap. Results from that pilot are expected to inform a broader recommendation to council before the end of the financial year.

Why coordination has been slow

Part of the delay is structural. Unlike Amsterdam's centralised national digitisation authority or the British Library's status as a single institution managing its own estate, Melbourne's cultural image holdings are fragmented across state government bodies, local council, universities, and independent galleries. The Australian Institute of Aboriginal and Torres Strait Islander Studies holds its own Victorian material under separate federal jurisdiction. Getting those entities to agree on shared deduplication standards — let alone shared tooling — requires intergovernmental coordination that moves slowly.

RMIT University's digital preservation research group in the CBD has published work arguing that a federated deduplication model, where each institution runs compatible tooling against a shared hash registry rather than pooling raw files, could work within existing privacy and custodianship constraints. That proposal has circulated among Victorian cultural policy staff since early 2025 but has not yet been adopted as official policy.

For institutions preparing their September efficiency reports, archivists are recommending a pragmatic short-term step: run a perceptual hash audit on any collection ingested before 2020, when multi-shot scanning rigs became standard and the duplicate problem accelerated. Tools including digiKam and the open-source DupeGuru can process large TIFF batches without specialist infrastructure. That does not solve the governance gap — but it at least quantifies it before the grant reporting clock runs out.

Partner Content

Sponsored

Tell Melbourne your story

Partner Content lets Melbourne businesses reach engaged local readers with a clearly labelled, editorial-style feature. Every placement is marked Sponsored, in line with our sponsored content policy.

Spread the word

Have your say

Loading comments…

Sources

About this article

Published by The Daily Melbourne

This article was produced by the The Daily Melbourne editorial desk and covers news in Melbourne. See our editorial standards for how we use AI.

The Daily Melbourne brief

The day's Melbourne news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Melbourne and accept our Privacy Policy. Unsubscribe anytime.

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Melbourne news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Melbourne and accept our Privacy Policy. Unsubscribe anytime.

You might also like

Free daily briefing

Enjoyed this story? Get tomorrow's briefing free.

The day's Melbourne news in a 2-minute read, every weekday morning. Free.

Subscribing to melbourne morning briefing.

The Daily Network

More from around Australia

View the whole network