Melbourne's public sector is confronting a surprisingly stubborn problem: thousands of duplicate digital images clogging archival systems, costing storage budget and burying the authentic records that residents, researchers and journalists actually need. This week brought fresh urgency to the issue, as two major Victorian institutions announced dedicated remediation programs aimed at cleaning up overlapping files across their collections before the end of the 2026 financial year.
The timing matters. The State Library Victoria, which holds more than 2.5 million digitised items at its Swanston Street building, confirmed on Thursday that it is midway through a systematic duplicate-detection audit launched in April. Separately, the City of Melbourne confirmed its Digital Heritage unit — operating out of the Town Hall precinct on Collins Street — is trialling automated deduplication software across its photographic holdings, a process expected to free significant server capacity ahead of a planned infrastructure refresh in the September quarter.
Why Duplicate Images Became a Crisis
The roots of the problem are not complicated. Over roughly fifteen years of aggressive digitisation, institutions uploaded the same image multiple times — once from a physical scan, again from a donated digital copy, sometimes a third time after a format migration. Nobody flagged the overlap because the file names were different and the metadata was inconsistent. The result: collections ballooned with near-identical files that consumed cloud and on-premises storage without adding research value.
Nationally, the scale of the problem has become clearer. A 2025 report by the Australian Research Data Commons — based in Melbourne and Sydney — found that duplicate and near-duplicate files accounted for between 18 and 30 percent of storage load across surveyed cultural heritage repositories. That figure alarmed IT managers already watching storage unit costs tick upward across AWS and Azure services commonly used by Victorian government bodies.
At the State Library, the deduplication project is being handled by an internal digital preservation team working with software that compares image hashes rather than file names, meaning near-identical files taken from the same negative at different resolutions are also caught. The library has not yet published a final tally of duplicates removed, but the audit covers holdings including the La Trobe Picture Collection, which documents Melbourne street life dating to the 1860s.
Local Programs Taking Action
The City of Melbourne's Digital Heritage unit is running a parallel but distinct effort. Rather than simply deleting duplicate files, the unit is using the remediation process to correct incomplete metadata — adding neighbourhood names, street-level location data and subject tags — before a single canonical version of each image is retained. Staff working from the Town Hall's Level 4 archive room began the pilot in late May, focusing first on images of the CBD and the inner-north suburbs of Fitzroy and Carlton.
The Victorian Public Record Office, headquartered on Ballarat Road in North Melbourne, has been advising both institutions on retention policy. Under Victorian records law, institutions must retain original master files even when duplicates are removed, meaning the storage saving comes from culling secondary and tertiary copies rather than primary records. That constraint shapes how aggressive any cleanup can actually be.
For smaller organisations — community archives, ethnic cultural centres along Sydney Road in Brunswick, local history groups in places like Williamstown — the resources to run formal deduplication projects simply do not exist. Several such groups have contacted the Public Record Office this year seeking guidance on low-cost tools, according to information published on the Office's website in June 2026.
The practical advice from archivists who have worked through similar projects elsewhere is consistent: start with the most recently uploaded collections, where duplicate rates tend to be highest, and build a file-naming convention before any new material enters the system. Once a backlog accumulates across a decade of uploads, the remediation cost multiplies quickly. For Melbourne's institutions, the clock started this week. The State Library expects to publish interim audit findings by late August.