Session: Digitizing 125 Years of Images

Photographs first appeared in The New York Times on Sept. 6, 1896 and have been an integral part of our journalism ever since. The New York Times has a vast archive of physical photos stored in hundreds of file cabinets in the “morgue”, organized by folders. This archive has significant historical value, some of these photos can be found nowhere else in the world.

The Times began a project in 2018 to digitize these images in order to preserve them and make the archive more easily searchable and researchable. An index card catalogue was the only way to research the archive. The backs of the images and the folders containing the images have rich information that were not indexed in any form.

In this talk, we will review how we built a system to ingest millions of photos and make them findable. We will dive into challenges with machine-learned tools for image and text recognition. We will also tell you about how we repurposed analog metadata for a digital search experience.