Melbourne's public digital archives contain tens of thousands of duplicate photographs, scanned documents and heritage images — a sprawling, expensive problem that the City of Melbourne and the Public Record Office Victoria are now formally trying to fix. The deduplication push, which involves automated detection software and manual curatorial review, is costing ratepayers and state government money at a time when digital infrastructure budgets are already stretched.
Why now? The pressure has intensified because several major institutions — including the State Library Victoria on Swanston Street and the Australian Centre for the Moving Image in Federation Square — have been consolidating their digital collections onto shared platforms since 2024. When collections merge, duplicate images surface in bulk. Institutions that once managed siloed databases suddenly find the same photograph of Flinders Street Station appearing dozens of times under different file names, metadata tags and accession numbers.
What Melbourne Is Doing — And What It Is Not
The City of Melbourne began a structured duplicate-image audit in late 2025, targeting its Melbourne Heritage Register photography holdings. The program uses a perceptual hashing algorithm — a technique that compares pixel patterns rather than file names — to flag near-identical images for human review. Public Record Office Victoria has been running a parallel effort, focused on deduplicating digitised council meeting records and planning permit photographs held across regional and metropolitan collections.
Fitzroy's Comune project, a community-run digital archive operating out of a studio on Smith Street, began dealing with the problem earlier and more informally. Volunteers there started tagging duplicates by hand in 2023 after discovering that a single 1970s-era photograph of Gertrude Street had been uploaded 47 times across three contributing collections. The experience shaped their submission to the Victorian Government's Digital Archives Strategy consultation in March 2026.
Compare that to Amsterdam. The Amsterdam City Archives — Stadsarchief Amsterdam — completed a city-wide deduplication of its 820,000-image photographic collection in 2023 using open-source tools developed by the Dutch national heritage sector. The project cost approximately €340,000 and removed roughly 14 per cent of the archive's total image count as confirmed duplicates or near-duplicates. London's Wellcome Collection took a different path, embedding deduplication checks directly into its ingestion pipeline so new uploads are screened before they enter the collection, not after.
The Global Gap Melbourne Needs to Close
Seoul's National Folk Museum completed its own deduplication project in 2024, using AI-assisted matching that also identified culturally distinct images that appeared superficially identical — a subtlety that straight pixel-comparison tools miss. That project reportedly cleared a backlog stretching back to the late 1990s, when early digitisation programs produced inconsistent file standards.
Melbourne's challenge is partly structural. Unlike Amsterdam or Seoul, where archive governance sits within a single municipal authority, Victoria's digital heritage landscape is split across the City of Melbourne, Public Record Office Victoria, the State Library, and dozens of regional and community archives. Coordinating deduplication standards across those bodies has proved difficult. There is no single shared metadata schema, which means an image flagged as a duplicate in one system may carry unique descriptive information in another.
The Victorian Government's Digital Archives Strategy, released in draft form in early 2026, proposes a common metadata standard for state-funded institutions by the end of 2027. Whether smaller community archives like Comune in Fitzroy will be brought inside that framework, or left to manage the problem independently, is still being worked through.
For institutions and community groups navigating this now, the practical advice from archivists with experience in the Amsterdam and Wellcome projects is consistent: start with ingestion controls, not retrospective cleanup. Preventing duplicates at the point of upload is cheaper and more accurate than hunting them down across an existing collection of hundreds of thousands of files. For Melbourne's heritage sector, the window to build those controls into new shared platforms — before another wave of collection mergers arrives — is narrow and closing.