The State Library of Victoria holds more than two million digitised images in its online catalogue. Buried inside that archive, according to library staff who work the collections desk on Swanston Street, are thousands of duplicate or near-duplicate files — scanned twice, uploaded under variant filenames, or ingested from separate donor collections without cross-referencing. The library confirmed it is mid-way through a structured deduplication audit, with completion targeted for mid-2027.
The problem is not unique to one institution, but Melbourne is arriving late to a fix that Amsterdam, Seoul and Toronto each began in earnest between 2021 and 2023. With AI image-generation accelerating the rate at which visual content enters both public and private databases, the gap between cities that have systematic replacement pipelines and those still patching manually is widening fast.
What Deduplication Actually Costs — and Why Melbourne Fell Behind
Duplicate image management sounds like a filing problem. It is actually a budget problem. The City of Amsterdam's municipal archive, the Stadsarchief, completed a two-year project in 2023 that eliminated roughly 340,000 redundant image files from its publicly accessible Beeldbank database, freeing an estimated 4.2 terabytes of server capacity and cutting retrieval query times by around 30 percent, according to figures the archive published on its open-data portal that year.
Seoul's metropolitan government funded a dedicated visual-asset deduplication program under its 2022 Smart City data governance framework, allocating the equivalent of approximately A$2.8 million across 14 municipal departments. Toronto Public Library, dealing with a similar backlog after merging four separate digital branch collections in 2021, contracted a specialist firm to run perceptual-hashing software across its entire image repository — a process that took eight months and identified duplicates at a rate the library described in its 2022–23 annual report as exceeding 18 percent of total holdings.
Melbourne's institutions have not moved at that pace. The Victorian government's Coordinated Public Records Program, administered through the Public Record Office Victoria in North Melbourne, covers policy guidance on digital recordkeeping but does not mandate active deduplication timelines for collecting institutions. Museums Victoria, which oversees the Melbourne Museum in Carlton and the Immigration Museum on Flinders Street, confirmed it uses its Collections Online system to flag potential duplicates, but declined to provide figures on the scale of its current backlog.
The Pressure Point: AI and Housing-Era Photography
Two specific pressures are making the problem harder to defer. First, AI image tools now allow staff at organisations like the City of Melbourne's own digital content team to generate dozens of visual assets in minutes — and without a strict naming and hashing protocol in place at the point of ingest, duplicates accumulate faster than any retrospective audit can clear them. Second, Victoria's housing density reform push has produced a surge in planning-related aerial and site photography submitted to councils across inner-city suburbs from Fishermans Bend to Brunswick, much of it unindexed and duplicated across multiple departmental systems.
The RMIT University library on Swanston Street has taken what its digital services team describes internally as a proactive stance, integrating open-source perceptual-hashing tools into its image ingest workflow from January 2026. The university has not published outcome data yet, but the approach mirrors what Toronto Public Library deployed and what the National Library of New Zealand began piloting in Wellington in late 2024.
For Melbourne to close the gap on peer cities, collections and IT leads say the priority is policy before technology. Amsterdam's Stadsarchief did not succeed because it bought better software — it succeeded because the municipal council mandated deduplication as a condition of annual digital infrastructure funding from 2021 onwards. That kind of legislative hook does not currently exist in Victoria.
The State Library's audit is scheduled to publish interim findings in October 2026. If those findings prompt Public Record Office Victoria to update its guidance, collecting institutions across the city could have a new compliance baseline by early 2027 — roughly four years behind where Amsterdam landed, and two behind Toronto. Organisations managing large image libraries are advised to begin perceptual-hash audits now rather than wait for state guidance to catch up.