## Organising digital documents for genealogy and family history?

21

2

I have been trying for over 15 years now to come up with a system of folders for my family documents. As I'm sure most of you know it can be difficult.

Without making duplicates of each document how do you sort your files? Meaning if I have a birth certificate for someone, I also want to include that document in the parents folder but don't want duplicates floating around. Additionally, when someone gets married and starts a family of their own, two previous families are combined.

How can I organise my electronic files so that they're associated with all the appropriate individuals but are not duplicated? What are the advantages and disadvantages of not duplicating the files?

Question was closed 2017-10-11T18:13:10.320

2

Steven, your question invites expressions of personal opinion rather than a definitive answer based upon evidence and expertise. Perhaps you could rephrase it after reading the FAQ at http://genealogy.stackexchange.com/faq#dontask

– Fortiter – 2013-01-09T00:59:41.260

– lkessler – 2013-01-09T06:21:21.253

Steven, I've edited your question to make it more specific. I hope I've preserved your intent; please correct me if I've got that wrong. – None – 2013-01-09T11:14:50.643

15

The best way to organise your digital documents is the one that matches the way in which you work. It will enable you to store, search, analyse and display information in the shortest time and with the least effort because it matches the way you think.

Unfortunately, few of us begin genealogy with a crystal-clear and firmly fixed idea of how we will go about the tasks that confront us. So we borrow someone else's file system or seek recommendations from experts.

But the way in which your information is stored can also shape the way in which you use it. Some relationships between people and events jump out of the files I have stored. Others are hidden away because I tend not to look at, or even to think about, some documents at the same time.

That explains why experienced and knowledgeable genealogists have responded to your question in such very different ways. In each case, they have described what works for them. If you think like the writer, then his or her scheme is the one for you. If it turns out that you don't agree on the basic philosophy, then that filing system might be a source of frustration for you.

In the end, you need to make a choice and try it out. You will learn which aspects are terrific for you, which are mildly annoying and which (if any) will make you think about giving up family history entirely. Then you will have a better idea of how you like to work and what style of file system to switch to.

The great thing about this site, is that you can gain a very good appreciation of the philosophies and practices of each of the people who have proposed a system by reading their "collected works" in the archives. Track the contributions of each one to decide if you think that their approach is similar to the way you work (or would like to work). Then follow the recommendation of the one who is "best for you".

I will not offer a recommendation because my filing system continues to evolve. I read aspects of each of the suggestions made and say "Wow, that is great" and follow up with "But...". An unkind observer might call my files a mish-mash of systems, I say that reflects the fact that I have an eclectic approach to family history practice. It may not be perfect but it supports the things that I want it to do. And that might be a realistic goal for you.

What do modern genealogists do when they have no internet connection? They reorganise their files, again.

14

I think the right way is with source-based folder organization. The highest level is the source type. Next level depends on the source type, but it could be the location, jurisdiction, person, repository, or whatever is applicable. It will become clear once you try it.

e.g.

BirthRecords
DeathRecords
MarriageRecords
Kentucky
CensusRecords
1870
UnitedStates
1880
1890
Photos
(by name of photographer)
Correspondence
(by name of contact)
Online
FamilySearch
Ancestry
Interviews
Archives
Libraries


... You get the idea.

If you don't like the order because folders are sorted alphabetically, then you can add a letter or number prefix to put them in the order you want, e.g.

01-BirthRecords
02-MarriageRecords
03-DeathRecords


With source-based data organization, you know where to file everything and there will always be just one place to file it. This compares to family- or name-based organization which requires multiple copies or links to source information that pertains to many people.

Very often, we genealogists have to go back a look at a source again, either to check what we originally got, to see if we missed anything, or to do a new search of it for information that we weren't looking for previously. With source-based organization, you will have in one place all the information you previously found, and you will be able to easily determine what you don't have and now need.

p.s. I recommend storing physical items in binders the same way.

1While this is a beautifully clean system with a single unambiguous location for every document, it does appear to preclude browsing by person (as in, "What do I have on Aunt Minnie?"). For some people that may be too steep a price to pay for an elegant organisation. – Fortiter – 2013-01-09T04:51:16.513

7@Fortiter - That is what your genealogy software is for. To provide the index into your source materials. Go to Aunt Minnie in your genealogy program. See what sources hang off her. But if you try to store everything on Aunt Minnie in an Aunt Minnie folder, then the Census record will have to be copied 8 times (for her husband and 6 kids). Birth certificates 3 times (each parent and the child), etc. – lkessler – 2013-01-09T05:00:46.147

1I organize my documentation in a similar way (by events and sources). Filenames include the appropriate details. Two advantages, not mentioned by Mr. Kessler, are 1) the ability to have multiple documents referencing the same event, i.e. obituaries from 2 different newspapers and the funeral home, and 2) the option to store shorter excerpts (indexes, transcripts) as collections in a text file. – bgwiehle – 2013-01-09T15:00:20.057

I organize by source and repository. That system scales up better than any other I've tried so far. To answer Fortifer's question of how to browse by person, I am currently experimenting with the writing software Scrivener from Literature and Latte http://www.literatureandlatte.com/ to keep notes about source material which has not been analyzed yet and won't be attached to that person in my genealogy software. Scrivener is designed for research and can be a virtual 'binder' holding both your research notes and the supporting documents you might want to incorporate in any reports.

– Jan Murphy – 2013-11-30T18:11:49.020

9

I assume you're talking about ordinary folders on the disk of your computer Steven. If so then there is no ideal way of organising/distributing your files between them. As you point out yourself, a simple hierarchical arrangement of folders does not easily model the associations within your family tree.

Most folder systems allow you to make multiple entries for the same file. In other words, allowing the name of your file to be entered into the directory or catalog(ue) of multiple folders. This contrasts with making physical duplicates of the file itself. However, support for this core functionality through the user interface of operating systems like Windows seems to be vanishing [I might just be unaware of where they've hidden it these days].

Even with this facility for avoiding duplication, though, there is still no ideal. I organise my data mainly on a surname basis, i.e. a main folder for each distinct surname (rapidly approach the 50 mark), and sub-folders for the different types of file associated with them. This is a very coarse organisation and obviously finds ambiguity where a document applies to two surnames, such as a marriage certificate.

This is one of the major benefits of using a software program to organise your data, and to keep track of those associations and even different file versions. The way the data is physically held on disk, or in some database, is then largely irrelevant to you - the need for performing backups being a possible exception. A good software product then presents a virtual organisation of your data to you that is more closely related to family trees and to genealogy.

+1 for the suggestion of using pointers (Windows shortcuts) instead of duplicating files. – Jan Murphy – 2014-07-05T13:16:14.220

9

1. Use your genealogy/family history software to organise your electronic documents (by associating them with the appropriate sources/individuals/places or other entities in your database).
2. Underpin it with as simple a folder structure as you can devise. I structure mine by high-level document type (BMD record, census record, MI, Will, Land record, photograph etc.). If a sub-folder structure is necessary (because of the likely volume of documents) I'll divide by geography and/or time-frame. (I'll always include these elements in the name of the file).

This way requires you to have the self-discipline to enter every document into your software and make the appropriate associations, but that's no bad thing. It makes it very quick to decide where to store each document by asking what? and maybe when? and where? but never who? and that's the most volatile element. And it makes it easy to find or add to the documents associated with an individual or family.

7

I have really two answers to this question.

Since my principle storage is electronic. I'll start there.

How you organize your electronic folders may relate to how and how much you actually save. I try to save things that are not readily available elsewhere. This means that I store few census records, and now fewer and fewer marriage and death records. I do store pension files, deeds, deed indexes and probate records. Unless someone has sent me the paper materials, I don't set out to store any of my research materials in paper form. I store all of my research reports in electronic form. I used to send them broadly to cousins (ala, spam the family), but I find myself blogging much of that work now. I save many e-mails to pdf.

The computer filing system I have used since the 1990s is summarized below.

Note: I have patronymics in my family tree and the spelling of some names changed over time. The word "surname" doesn't quite describe top level organization; here I'll use the term "family lines." Hopefully the examples below are helpful.

Top level organization: Each of my ancestral family lines (as in a pedigree) has a high level directory; I call each a "log." I prepend these folders with the word "log" (including an underscore) so that these folders will group together and can be forced to sort at the very top of my Genealogy directory. For example

Each of the family lines has a default subfolder system that classifies a "generation," and each of direct ancestors has a first level folder. I assign a subfolder to the other children of my ancestors (siblings). For this purpose, I use a numbering system that is not unlike Register or NGSQ to track each "node" of my direct ancestors in the family line. An example follows.

Other than for the higher level direct ancestors, I don't create a folder for the other children (siblings) unless I've actually conducted research about them.

For every surname, I have a "dump" folder, too. I use this for information about the larger family and for things I don't want to categorize further at the time. These "dump" folders carry the same general name, "XXX [surname assigned to the family line] and Related"

From time to time research turns into a what for this purpose, I will call "negative proof." From time to time, I create "Not" folders to hold that research. So, for example:

Under the main folder/directory, "Genealogy" I have a number of other folders. A few examples follow:

• Genealogy General. This folder contains electronic version of my genealogical reference materials. For example, in this folder and its associated subfolders I have Black'sLaw, Evidence Explained, Numbering Your Genealogy ...., etc.
• Descendant Researchers. This folder contains family files shared with me and some e-mail histories. Usually there is a subdirectory that carries the name of the collaborating researcher. For example, probably the largest folder in that section is titles, "William Smith," who I've been collaborating with since the 1990s.
• Album. This folder includes the work I'm doing to digitize family collections.

Somewhat on topic, I blog and work to develop full citations for the entries in my family file. I take full advantage of the note keeping features of genealogical software. The benefit is that today I don't really obsess much over the drive organization the way I did in say 2000. If I have recorded the author and title of a source in my file or on my blog, I can almost always find that puppy and all my related related research notes.

As above, I don't set out to "store" any materials that are readily available from "stable" sites on the internet. I may download a birth or census file for the purpose of attaching it to an e-mail, or use it to create graphic for a blog article. I'll more than likely delete that electronic file after I've finished that work.

I do invest some worry time in the material paper files and collections that I hold, as these generally represent privately held materials that have not been widely circulated. I work to keep those materials in tact (even in the original order received); most are stored in archival format (archival sleeves, etc.). Dictated largely by dominance, those materials are generally organized by surname (family line), researcher name, or collection/repository.

Updated: When I digitize collection materials, each item (scan) is given a descriptive number or code. I write the code/number on a sticky dot that goes on the face of the archival sleeve. An example of a code is "600T-1063." The first set of code represents the resolution and file format (so 600T stands for 600 dpi, Tiff). The second group of numbers represents the electronic scan number (part of the file name; ultimately intended to be part of the metadata).

Update 2: You asked specifically about where to store birth information about children--whether with parents information or the folder about a married adult. I don't think I really have a black and white rule, but my direct ancestors families are organized in the male ancestor's line--so most of the records about my maternal ancestors are stored under their husbands' family line. Research about my maternal ancestors sister-siblings, however, are stored under the father's family lines.

There are well documented paper storage methods that recommend storing information by record type (birth, marriage, death, deed, probate, etc.). That system didn't work as well for me, maybe because I have patronymics and, even when I don't, there are so many similarly named persons in my different family lines.

5

Folders should begin at the surname level. Anything else is unfriendly toward future users of your collection.

Pretend that you want to donate your files (physical and digital) to a library or society in five years. Go ask them how they would like them organized, so patrons can find and utilize them. Do they want a file marked "miscellaneous birth certificates collected by and related to Steven O'Neill" or "O'Neill family records collected by Steven O'Neill"?

Break up your files into 8 or 16 surnames minimum (I probably have 60 or 70). Large surname files can be further divided into collateral lines, or by source type (or both). Use your own judgement when a file is too large (100 pages? 500 pages?). I'm doing a one name study on my surname, and have it divided into clans.

When determining your sub-folders, keep the library in mind. Do they want them organized in the order they were collected? No. Do they want to learn the secret coding system you made up? No. Do they want a copy of your software, which is required to find any particular file? No.

Keep it simple so any researcher, age 12 and above, can use it. You don't need copies of records that are easily found online. If you wish to avoid duplication, keep a wife's birth certificate with her parents' family file, and write a note to see her marriage in her husband's family file. Then on the marriage certificate, add a note to find the wife's birth.

1@RustyErpenbeck I like this. Your answer is different from the others, but it makes sense. I initially thought that my genealogy research was just a fun way to chill out and spend alone time. But now that I've invested so much into it, I definitely don't want it to get tossed out when I'm gone. And when people are purging, anything that looks complicated will hit the bin. – Canadian Girl Scout – 2016-05-24T08:10:42.923

Is a library going to want a collection of readily-available electronic documents, divorced from its context (the documentation of your conclusions and their sources, be that a print-out or a common electronic document format)? And if the question was about physical documentation, I'd expect any library to apply its own cataloging standards. – None – 2013-01-09T12:59:03.450

As I said in my answer, readily-available documents need not be saved at all. You only need to preserve the hard-to-find and privately-held documents. Our library and historical society both keep large collections of local "family files" that researchers have added to over the years. Their contents are not individually cataloged, just rows and rows of surname files. The library is now working toward accepting family files in PDF form for their website. – Rusty Erpenbeck – 2013-01-09T19:09:23.410

Sorry -- missed the 'easily found' piece in the last paragraph. But the question is still about electronic documents, not physical documents. – None – 2013-01-09T19:44:36.783

1The same holds true for electronic documents. They need to be easily searchable by whoever inherits them. And if a 4th cousin asks you to share your "Johnson" file, you should be able to do that effortlessly. Just keep it friendly, easy to search, easy to share. – Rusty Erpenbeck – 2013-01-10T00:54:44.730

4

This answer has been revised -- the beginning of this answer addresses the problem posed in the question about source material like birth certificates which name more than one person.

I store my electronic documents by record type and record group, with subfolders for each vendor if necessary (e.g. if I have census images from both Ancestry and Heritage Quest). The basic principle is to keep things organized by the record group they came from. Family documents and other materials which do not originate from a commercial vendor are grouped together in a 'personal archives' folder. Files originally sent from cousins or other correspondents are kept in folders marked with the correspondent's name, along with 'printouts' of emails which contain transcriptions or other genealogical discussion.

This allows me to group an image together with other like items on the same microfilm roll. If I have pages from a City Directory, I can keep all the pages from the same directory together -- the title page, the pages which describe the town, the numerical directory, the alphabetical directory, ads, information on organizations, and so on.

If I have multiple pages from the Vital Records to 1850 from NEHGS for that town, I have the pages that explain the abbreviations, the publication data, and other material, plus all the pages I found from that volume, grouped together in the same folder.

I can have all my 1940 US Federal Census pages together with a blank form for that census, the enumerator instructions and list of questions from IPUMS, and other supporting material for that census. If I'm reading an image and I can't remember what the column header says, the reference material I need is right there.

If I have supporting documents about a data collection (e.g. the National School Admissions registers on Find My Past), like the chart of which schools are included in the collection, I can save that in the folder along with the digital images and PDFs of abstracts from that data set.

Now -- the people who file by surname are probably asking But what if you want to find all the material about the same person?

There are advantages to having a folder system based on places and on people for the material you generate yourself -- your own research reports, your notes, your Source Checklists, and so on -- but it is madness to store the actual source material, the census records and birth certificates in those files as your permanent storage place. It's not practical to make copies of those kinds of sources for each of the people of interest mentioned in them, even in digital form.

It's much easier to make a Genealogy Source Checklist in Excel so you can see at a glance what you've looked for, and what you are missing -- for an example of how to make one, see Crista Cowan's YouTube video.

I use the writing software Scrivener when I am working on a project. In Scrivener, I can link to all the images, all the abstracts, my research checklist, and all the other material I want for the project I'm working on.

After I have completed the initial analysis of the source material, I can also retrieve the images by looking at the person of interest in Family Historian or Clooz (the images are linked to the relevant source, not to the person, but it's easy to find the sources).

If the images have not been processed and filed yet, I can find them in my 'intake' folders, sorted by which vendor they came from and what dataset. This suits my workflow, because I find it most effective to process like items together for the initial data entry (all baptisms at once, all 1940 Census records at once, etc.)

Why do it this way? If you maintain a strict separation between your own work product, and the material which you have collected from somewhere else, then you can send your cousin your own files of "what you have found on a particular person" without also sending them other people's copyrighted material which you do not have the right to share.

3

I'm trying Qiqqa ("quicker" in an Australian accent), a free research and reference manager popular among my fellow scientists.

If you point it at a folder, it will process all the pdfs, pulling out the title (regardless of file name), and metadata. It has duplicate file recognition. You can set it up with a watch folder, and it will process newly added items automatically. You can keep your libraries local, or put some/all items on their server so they are available on your mobile devices. The free account comes with nominal storage or you can buy more.

You read your pdfs inside Qiqqa, where you can add tags, comments, highlights and annotations. There is a "powerful annotation report" but I haven't tried that. Qiqqa will search your library by tag, or full text (PDFs are "textified via OCR as processed). In addition to the tags automatically added, I add others - surnames, location, type (e.g., BMD, deed, probate, etc.), repository....

I think this will work for me - it doesn't matter what the file is named or where it is. I can add relevant tags as I work through each item. I have yet to try saving my image files as pdfs, so I can't say if the zoom is sufficient for transcription.

3

This is a somewhat odd answer in that I would suggest a relatively non-hierarchical filing system, one with as few folders as possible. Specifically, I would suggest leaving most documents lumped together in one "genealogy" folder. Title each image with the name of the person or persons, the type of record and perhaps the date of the record if that isn't in the image itself. Then, if you need to find something, just use the little search bar that is in the upper right hand corner of the window (at least, it is if you are using Windows).

For instance, say you are interested in John Doe and his wife Jane Smith. They might have documents titled:
John Doe birth certificate
Jane Smith birth certificate
John Doe and Jane Smith marriage license
John Doe 1880 census
John Doe 1900 census
etc. etc.

If you want to see every record about John Smith, you just search "John Smith." If you want to see every birth certificate you have, search "birth certificate."

There are a couple of key things to keep in mind for this to work.
1. Always use the same name for indexing purposes. If an individual used both Smith and Smyth, pick one and use it for all records relating to that person, putting the other name in parentheses if the record spells it that way, i.e. Jane Smith (Smyth).
2. Always reference women by their maiden names. It reduces confusion.
3. Always list both people on a marriage license. You can do the same for censuses, but I just list the head of household.

The only things that I break off into separate folders are multi-page documents that all go together, such as wills, probates, or excerpts from published genealogies. And if you have two page birth certificates or the like, I just number them sequentially, such as "John Smith birth certificate 1" and "John Smith birth certificate 2."

The reason I do this is because of some of the downsides already listed. If I create a folder for every person, that's over 1000 folders, and I would have to copy census and marriage records to multiple people. If I break it up by records, then it becomes hard to track individual people. I figure the computer has a search function, I should take advantage of it just like I would a filter in a genealogy program. If I want to look at documents organized by people, I can search the name. If I want to see them organized by type, I can search the type.

As an aside, I would take some issue with the idea of not saving copies of "easily accessible" records like birth certificates. As we all know, finding these records can sometimes be a royal pain. Maybe they were indexed in a weird way and we just stumbled on them. Maybe they weren't where we expected. Maybe they were easy to find last year, but now you can't for some reason. Besides that, websites, even seemingly stable ones like FamilySearch and Ancestry, can come and go. They could disappear tomorrow. Their servers could burn in a fire. The microfilm or the original records could vanish. They could change their terms of service. For financial reasons you may need to suspend your account. Or maybe you just want to be able to access the records offline, or without having to go searching again.

I figure that the only way I can guarantee that I will be able to double-check my research and have the references available to someone else, is to have a copy of the record. Thus, I download every relevant record I find. If it's an index only record, I take a screenshot. If it's a grave on Find a Grave, I take multiple screenshots. I think it is just a prudent precaution to take.

I mean, if you have your great grandfather's original birth certificate and you record all the information in your genealogy software, you wouldn't then go burn the original certificate, would you?

Upvoted especially for the advice to keep a local copy of images. We can't depend on all of them staying online even if we found them there originally. – Jan Murphy – 2017-07-30T00:11:56.133

3

What works well for me is an adaptation of William Dollarhide's system. All my source documents are given a record number in the following format: S-ST-CTY-XXX- Subject name, Source Description (where source document was found), where S= 1st Initial of Family Group Surname ST= State where the document originated CTY= County where source document originated XXX= A 3 digit number. So, for example, V-AL-CHO-138- James Turner, "Dies at Home", 3 Mar 1912 (Cho Advocate) would reference my Vaughan Family Line, Alabama, Choctaw County, Record #138, Subject's Name, Title of article, Date of article, and (newspaper item was found in).

Using this naming system, my computer automatically sorts all files by surname and location. Correspondence and Compiled sources are handled a little different.

Welcome to G&FH SE! If you have not already one so, I think it is worth every user of this site taking the 2-minute [Tour]. I hope you will enjoy both answering and asking questions here. – PolyGeo – 2015-03-29T04:23:16.210