Melbourne's public institutions are facing a cataloguing crisis years in the making. Duplicate digital images — identical or near-identical photographs stored separately across disconnected systems — have quietly accumulated inside council archives, gallery databases and government record-keeping platforms, costing storage budgets and obscuring genuine historical material. The push to finally address the problem through systematic duplicate image replacement programs has arrived not a moment too soon.
The issue matters right now because Victoria's state government has accelerated its digitisation agenda since 2022, when the Public Record Office Victoria expanded its digital intake program to absorb backlogs from regional councils and cultural bodies. That rapid ingestion of material, without a unified deduplication protocol, seeded the current mess. Every institution that uploaded scanned photographs, event imagery or planning documents did so into its own silo, with no cross-referencing against what neighbours had already lodged.
How the Silos Were Built
Trace the problem back to the mid-2000s and it starts with good intentions. The City of Melbourne, the State Library of Victoria on Swanston Street, and bodies like Museum Victoria — now Museums Victoria, based at the Melbourne Museum in Carlton — each built their own digital asset management systems independently. At the time, interoperability standards were nascent and budgets were tight. Nobody mandated a shared spine.
By the time cloud storage became cheap enough that duplication carried a near-zero marginal cost per gigabyte, the habit was entrenched. Departments inside the same organisation were uploading the same event photographs taken at Federation Square or Southbank precinct events without checking whether a colleague had already filed an identical frame. The City of Yarra, the City of Port Phillip and several inner-north councils compounded the problem during the 2016–2020 period when community grants programmes funded dozens of neighbourhood documentation projects, each producing image sets that were archived locally with no deduplication step.
The State Library flagged the scale of the redundancy problem as early as 2019 in its internal collection strategy documents, noting that storage inefficiency was beginning to affect retrieval performance and metadata integrity. When identical images carry different file names, different catalogue entries and sometimes conflicting rights metadata, researchers — and Freedom of Information applicants — can retrieve contradictory records for the same event or the same street corner.
The Cost of Doing Nothing
Storage costs are one dimension. Enterprise-grade archival storage, particularly for high-resolution TIFF files common in heritage digitisation, runs to several thousand dollars per terabyte per year for redundant off-site solutions. Institutions carrying tens of thousands of duplicate image files across multiple format versions are not dealing with a trivial line item.
The deeper cost is to discoverability. When the Shrine of Remembrance on Birdwood Avenue undertook a catalogue audit of its digital photographic holdings in recent years, staff found multiple entries representing the same physical photograph under different accession numbers — a predictable consequence of migration between systems without a merge-and-deduplicate phase. The Shrine's experience is not unusual; it is representative.
At the municipal level, duplicate images inside planning and building record systems create specific legal exposure. If two versions of the same site photograph carry different metadata timestamps, that discrepancy can complicate heritage assessments and permit disputes — an increasingly live concern in suburbs like Fitzroy and Brunswick where density reform is reshaping the planning caseload.
Victoria's broader digitisation program, administered through the Department of Government Services, is now incorporating mandatory deduplication checkpoints for new ingestion. Institutions receiving state funding for digitisation projects after July 1, 2026 must demonstrate a duplicate-checking protocol as a grant condition. That policy shift represents the clearest official acknowledgement yet that the problem requires structural correction, not just storage management.
For archivists, records managers and the organisations that rely on clean public data — journalists, researchers, legal practitioners and FOI applicants among them — the practical next step is an audit. Institutions that have not yet mapped their duplicate exposure should prioritise a holdings review before the next major ingestion cycle. The tools to do so, including perceptual hashing software capable of identifying visually identical images regardless of filename or metadata, are now standard in the sector. The question is whether the will and the budget follow the policy.