Victoria's public sector is sitting on a cataloguing problem that has been building since at least the late 1990s. Duplicate images — the same photograph, scan, or graphic asset stored multiple times under different file names, in different databases, sometimes with contradictory metadata — have quietly multiplied across state government archives, council libraries, and cultural institutions. The cost of cleaning that up is now forcing agencies to confront a mess decades in the making.
The timing matters. Across Melbourne, a raft of institutions are mid-way through or approaching the end of significant digitisation contracts. The Public Record Office Victoria, based on Queensberry Street in Carlton, has been managing the migration of physical records to digital formats under rolling programs tied to broader Victorian Government ICT strategies. Simultaneously, local councils from Maribyrnong to Monash have been uploading planning permit images, heritage photography, and community event archives into shared content management systems. Every one of those workflows carried a risk: if two staff members scanned the same document on different days, or a contractor ingested files from a legacy drive without deduplication checks, the result was duplicate entries — sometimes hundreds of them for a single original image.
How the Duplication Problem Accumulated
The roots of the problem are straightforward, even if the fix is not. Through the 1990s and into the 2000s, most Melbourne councils and state agencies stored images on local servers or even physical media — CDs, DAT tapes, external hard drives. There was no centralised asset register. The State Library of Victoria on Swanston Street, for instance, manages collections that span multiple acquisition eras, each with its own cataloguing convention. When those collections were progressively digitised, staff often worked from incomplete records of what had already been processed. A photograph of Flinders Street Station from the 1920s might sit in three separate folders: one from a 1998 scanning grant, one from a 2007 heritage project, and one uploaded by a volunteer in 2015 under a digitisation partnership.
The problem was compounded by procurement choices. Different agencies bought different digital asset management platforms — sometimes within the same portfolio. A 2019 Victorian Auditor-General's Office report into records management across the public sector identified fragmentation of ICT systems as a recurring risk to data integrity, though it did not specifically quantify the scale of image duplication. By the mid-2020s, storage costs had dropped sharply enough that keeping duplicates felt cheaper than auditing them — until agencies started trying to search, licence, or share assets and found the catalogues unnavigable.
Council archives present a particular challenge. The City of Melbourne's own digital library holds tens of thousands of images covering everything from 1950s Queen Victoria Market photographs to construction documentation for the Docklands precinct from the early 2000s. Staff tasked with retrieving a specific image for a public report or heritage assessment can spend hours confirming whether two visually identical files are actually the same original scan or subtly different copies at different resolutions.
What Comes Next for Institutions Trying to Clean House
The immediate practical step being adopted across several Victorian agencies is the deployment of perceptual hashing tools — software that generates a fingerprint for each image based on its visual content rather than its file name or metadata, flagging near-identical files for human review. These tools have been available commercially for years; the constraint has been staff time and institutional will to run a systematic audit rather than a spot-check.
For smaller organisations — community arts groups in Collingwood, migrant resource centres in Footscray, neighbourhood houses across the northern suburbs — the issue is less about technology than about capacity. Many rely on volunteer cataloguers and free-tier cloud storage. They have no deduplication workflow at all.
The practical advice from records management professionals is blunt: establish a single source-of-truth folder before ingesting any new batch of images, assign unique identifiers at point of creation rather than point of filing, and run a deduplication check before any major platform migration rather than after. Victorian agencies now facing costly clean-up projects wish someone had insisted on that sequence fifteen years ago.