Melbourne's major cultural institutions are under mounting pressure to resolve a sprawling duplicate image problem that has quietly ballooned across their digital archives, with some collections now carrying error rates that waste significant storage budgets and degrade public search results. The issue has moved from a backroom IT headache to a governance question with real funding consequences.
The problem matters now because state and federal cultural agencies are in the middle of multi-year digitisation pushes. The Public Record Office Victoria, based in North Melbourne, is midway through an ambitious program to bring physical records online. The State Library Victoria on Swanston Street has been expanding its digital catalogue since 2022. Both institutions, along with smaller players across the city, risk encoding the same duplication errors into permanent public-facing databases if they don't settle on a deduplication methodology soon.
What's Actually at Stake
Duplicate images are not just a tidiness problem. When a collection search returns the same photograph four times under slightly different metadata tags, it erodes confidence in the catalogue, skews usage statistics that inform future acquisition budgets, and burns cloud storage costs that come directly out of program funding. Industry benchmarks published by the Digital Preservation Coalition suggest duplicates can account for between 15 and 30 percent of total storage in large unmanaged collections — a figure that translates to tens of thousands of dollars annually for a mid-sized institution running cloud infrastructure.
At the State Library Victoria, the Redmond Barry Collection and the photographic holdings from the Herald and Weekly Times archive are among the most cited examples of material that has been ingested multiple times across different digitisation rounds. The library has not publicly disclosed its current duplicate rate, and a formal methodology for resolving conflicts between near-identical images — where lighting, cropping or scan quality differs — has not been announced.
The City of Melbourne's own digital asset management system, which covers imagery from Council events, planning records and public art documentation, is facing a related fork in the road. The Council adopted a new digital asset platform in late 2024, but staff familiar with the migration have noted that legacy files from the previous system were bulk-imported without a deduplication pass first. That decision, made under time pressure during the transition, is now the subject of an internal review.
The Decisions That Will Define the Next Phase
Three choices dominate conversations among archivists and digital collection managers in Melbourne right now. First: automated versus manual deduplication. Automated hash-matching tools can identify pixel-perfect duplicates quickly, but they miss near-duplicates — the same image at different resolutions, or with different colour correction applied. Manual review is more accurate but expensive. Most institutions will need a hybrid approach, and agreeing on quality thresholds is not straightforward.
Second: who owns the canonical version. When two departments or two institutions hold overlapping images, deciding which copy becomes the authoritative record — and which gets archived or deleted — touches on institutional pride, provenance obligations under the Public Records Act 1973, and copyright. The Victorian Electronic Records Strategy, administered by Public Record Office Victoria, provides a framework but leaves significant discretion to individual agencies.
Third: whether to share infrastructure. Several smaller Melbourne organisations — including community archives in Footscray and Fitzroy — lack the in-house technical capacity to run deduplication projects independently. A shared-services model, potentially coordinated through Museums Victoria at Carlton Gardens, has been floated informally but has not progressed to a funded proposal.
The practical timeline is tight. Public Record Office Victoria's current digitisation tranche is scheduled to conclude by mid-2027. Any deduplication standard that isn't embedded in institutional workflows before that deadline will be playing catch-up against a much larger backlog. For smaller community archives, the window is even shorter — several are operating on project funding that expires before December 2026. The decisions being made in the next six months, largely out of public view, will determine whether Melbourne's digital cultural record becomes a coherent, searchable asset or an expensive, tangled archive that serves nobody particularly well.