The Meintjes NAAIRS Parser
Using The NAAIRS Parser, by Keith Meintjes.
Introduction NAAIRS is the online index to the South African Archives. You can choose the data base (“RSA” is all of them) and then do a search. A typical result is like this:
Document | 4 of 4 |
DEPOT | KAB |
SOURCE | MOOC |
TYPE | LEER |
VOLUME_NO | 7/1/271 |
SYSTEM | 01 |
REFERENCE | 49 |
PART | 1 |
DESCRIPTION | VAN DER WALT, HESTER. WIFE OF JOHANNES PETRUS DUVENAGE. WILL. |
STARTING | 18520000 |
ENDING | 18520000 |
REMARKS | FILED 1864. |
So, each result is spread across many lines. In addition, there are those pesky hard spaces NAAIRS uses to fill out each line to 80 characters. I asked my son, Ian, if he could write a parser that puts the information for each result into columns in a spreadsheet, like this:
N |
DEPOT |
SOURCE |
TYPE |
VOLUME_NO |
SYSTEM |
REFERENCE |
PART |
DESCRIPTION |
STARTING |
ENDING |
REMARKS |
4 |
KAB |
MOOC |
LEER |
7/1/271 |
1 |
49 |
1 |
VAN DER WALT, HESTER. WIFE OF JOHANNES PETRUS DUVENAGE. WILL. |
18520000 |
18520000 |
FILED 1864. |
My idea is to aid my research, by adding a field for notes. Now, I have a permanent record of the documents I have looked at. The spreadsheet can easily be searched and sorted.
Here is the high-level process:
- Make a text file by scraping the results of a NAAIRS search into a text file.
- Put this file into the parser.
- Copy the parser results to your Clipboard
- Paste the results to Excel, or to a file of your choice.
You may then want to play with the Excel file to format headings, column widths, text wrapping, and the like.
Here are the details:
Step 1 Open a new file in some simple text editor like Notepad. Save it as a file with a meaningful name, say, “MySearch.txt”.
Step 2 Open NAAIRS at http://www.national.archsrch.gov.za/sm300cv/smws/sm300dl Choose your data base (“RSA” for all). A new page will appear. Enter your search parameters and press “Enter” or click on “Search”. On the next screen, choose “Result Summary”. This screen is a list of 20 hits. Click “Select Page” which will put a check mark next to each document. Or, manually select the documents you are interested in. At the bottom of the page, click “Next”. Click “Select Page” again. Continue until you have selected all the citations of interest on all the pages of the Result Summary.
Step 3 At the top of the NAAIRS screen, choose “Multiple Documents”. The result will be a display of all the documents you selected in Step 2. Carefully select all the text, then copy (CTRL+C) it. Paste (CTRL+V) the text into the document you created in Step 1, and save it.
Step 4 Go to the parser. It is here: https://meintjes.github.io/ Click on “Choose File” and select the text file you have just saved in Step 3. There is then NO ACTION BUTTON. Click the button “Copy to clipboard” and copy (CTRL+C) the resulting highlighted text to the Clipboard.
Step 5 Paste the parser output into Excel, or any other application. When you put it in Excel (there are many ways to do this) be sure to Paste or Import it as text. Otherwise Excel may reformat some of the fields. Done. In an application like Excel, you can now format, search and rearrange your NAAIRS results.
Acknowledgement Many thanks to Geoff Chew for testing the early alpha version of the parser on github, and also to Johann Hanekom for his beta testing and valuable suggestions.
On Text Files, Excel, and NAAIRS The purpose of the parser is to take a text file scraped from NAAIRS and then to make a Tab-delimited column-oriented file. It is not to teach you how to use NAAIRS or Excel. When using NAAIRS, I always select “End Date” <= 2222. What that does is sort the results by ending date.
Now, there is a bug in NAAIRS. (Actually, there are many bugs in NAAIRS, but that’s a different discussion.): When you do a NAAIRS search that has more than 400 results, at the bottom of the “Results Summary” page you will see:
Result Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [ NEXT>>]
Click on “20”. Then, you are on page 20, and should see results 381 - 400. Now, click on “NEXT”. OK, you are now on page 21, but the above message does not change to tell you that. It is then easy to get lost about which page you are on. You can go figure, but my solution is to do a new search starting with End Date >= the last date on page 20.
There are numerous ways to “Paste” or “Import” into Excel. Be sure that you are bringing in data as “Text”, and check that Excel has not reformatted fields. I appreciate that this procedure can seem a little cumbersome, but you only need to do it once, for each family or topic you are interested in. I have been printing the column-based parser files, and getting them spiral-bound at my local office store. They serve as research log books. For each interest, I doubt I will ever do the NAAIRS search or print any of the parsed files more than once.
Copyright and Use Restrictions This tool is offered for free distribution, and at no cost. It is copyright © Keith Meintjes, 2017. No part of this work may be offered for sale, nor posted on a site that has a paywall or requires a subscription fee for access. This tool is provided only for personal use by individual researchers. NAAIRS is subject to the copyright of its owners, the South African Archives.
Please DO NOT post compendiums of NAAIRS search results online.
Getting Help To use the NAAIRS parser you should be familiar with NAAIRS and with Excel or a similar table-oriented tool. To get help, I suggest you post on the Rootsweb South-Africa list: http://lists.rootsweb.ancestry.com/index/intl/ZAF/SOUTH-AFRICA.html
If you believe there is a bug, please e-mail me with NAAIRS Parser in the subject line. Keith Meintjes Michigan, USA
- Hits: 28620