Victoria's peak cultural bodies are now confronting a problem years in the making: vast digital collections riddled with duplicate images that consume server space, confuse researchers and undermine public access to the state's visual record. The push to fix it — through a process known as duplicate image replacement — is accelerating in mid-2026, but understanding how the archives got into this state requires going back at least two decades.
The problem matters now because three separate Victorian government reviews, completed between 2022 and early 2025, each flagged uncontrolled digital duplication as a priority risk for long-term collection integrity. Institutions that received digitisation funding under the former federal Cultural Infrastructure Program — which ran from 2017 to 2022 — scanned hundreds of thousands of items without any unified metadata standard. The result was the same photograph, the same architectural drawing, the same newspaper clipping turning up under different file names, different resolution tags and sometimes different rights classifications across multiple platforms simultaneously.
The Accumulation of Two Digitisation Booms
Melbourne's institutional archives hit peak duplication during two distinct periods. The first came after the State Library of Victoria on Swanston Street launched its mass digitisation push around 2009, converting fragile physical collections to JPEG and TIFF formats for the first time. The second followed the 2020 lockdowns, when institutions including Museum Victoria, based at the Melbourne Museum in Carlton Gardens, rapidly pushed collections online to maintain public engagement. Both booms moved fast and prioritised access over curation. Quality control — including checking whether an image already existed in the collection — was treated as something to fix later. Later kept getting deferred.
The State Library's Pictorial Collection alone is estimated to hold more than 800,000 digitised images. Within collections of that scale, manual deduplication is not a realistic option. Archivists at institutions including the Public Record Office Victoria, headquartered in North Melbourne, have described the duplication rate in some sub-collections as high as one-in-five files — though that figure has not been independently audited and should be treated as indicative rather than definitive. What is documented is the cost: a 2024 budget submission from the Victorian Government's Creative Victoria directorate cited storage expenditure across the state's major collecting institutions at roughly $2.3 million annually, with duplication identified as a key driver of unnecessary spend.
Software-based duplicate detection — comparing image files by hash value, perceptual similarity algorithms, or both — has existed for years in the commercial sector. The barrier in the public archive world has been procurement inertia and a lack of agreed standards for what constitutes a true duplicate versus a legitimately distinct version of an image. A photograph taken on the same day at the same location but at a different exposure, for instance, may carry independent archival value. Institutions have been reluctant to automate deletion without human sign-off on those edge cases, and that caution, while defensible, has slowed progress.
Where the Process Stands Now
The Victorian Government's Digital Archives Modernisation Strategy, released in March 2025, set a target of completing duplicate image replacement protocols across the four major state collecting institutions by December 2027. The strategy does not mandate a single technology solution, but it does require institutions to adopt the Dublin Core metadata standard — a baseline schema used internationally — by mid-2026. That deadline lands this month.
For researchers using Trove, the National Library of Australia's aggregation platform, or the State Library's own catalogue, the practical upshot should eventually be simpler searches and fewer instances of the same image appearing multiple times under unrelated subject tags. For institutions, it means leaner storage bills and cleaner rights management — knowing exactly how many distinct images exist, and which are licensed for reuse, matters enormously as AI training datasets increasingly draw on public collections.
The immediate next step for anyone working with these collections — whether a journalist, a heritage architect sourcing historical photographs of Fitzroy streetscapes, or a student — is to check whether the item they are using carries a confirmed accession number from the originating institution. That single piece of metadata is the clearest signal that an image has been reviewed and confirmed as a unique, correctly attributed record rather than an unreplaced duplicate still floating in the system.