README — Building and Maintaining the nk-data Database

This page explains how to (1) recreate the nk-data folder (CSV files + catalog) from the refractiveindex.info (RII) YAML database, and (2) add custom sources manually so the shared nk function can use them immediately. (Let me know of any glaring errors in the README as I haven't scrutinised it nearly as much as I did the code it's talking about.)

TL;DR: Run rii2catalog.py once to build nk-data/. When you get a new dataset, drop a CSV into nk-data/data/ and add a small JSON block to nk-data/catalog/catalog.json.

0) Expected layout

project/
├─ nk.m                    % (or nk_sam.m) the MATLAB function
└─ nk-data/
   ├─ data/                % CSV files: wavelength_um,n,k
   └─ catalog/
      └─ catalog.json      % all sources + metadata

The function looks for nk-data/ next to itself (or via opts.data_root or the NK_DATA_ROOT environment variable).

1) Rebuild nk-data from refractiveindex.info

Prerequisites

Python 3.9+
PyYAML (for parsing RII YAML)
The RII database (release zip or cloned repo)

PowerShell:

python --version
pip install pyyaml

Get the RII database

Download/unzip the official RII database so you have a path like:

C:\...\rii-db\database\
  data\main\...
  data\organic\...
  data\glass\...
  data\other\...
  data\sopra\...
  data\3d\...

(Those subfolders are “shelves” of materials.)

Run the converter

Use the included rii2catalog.py to convert YAML → CSV + catalog JSON.

PowerShell (from your project folder that has nk.m and rii2catalog.py):

$DBROOT = "C:\Users\<you>\...\rii-db\database"
python .\rii2catalog.py --db-root "$DBROOT" --out-root .\nk-data --log-sampling `
  --shelves data/main data/organic data/glass data/other data/sopra data/3d

What it produces:

CSV files into nk-data/data/ with the exact header:

wavelength_um,n,k

 Wavelengths are in micrometers (µm). n or k may be blank if absent.

catalog.json into nk-data/catalog/ with, per source:

 * source_id (stable identifier)
 * kind (thinfilm or bulk) and valid_thickness_nm if detected
 * path (relative to nk-data)
 * valid_wavelength_um = [wmin, wmax]
 * has_k (boolean)
 * year (parsed from the reference text; newer wins ties)
 * ref_string, notes, provenance fields
 * SHA-256 checksums for each CSV (optional but included)

Supported formula sampling:

Formula 1 (Sellmeier-like) and 4 (Cauchy) → sampled to n(λ)
Other formula types are skipped (console shows [SKIP])

Notes

All wavelengths are stored in µm (RII’s internal unit). The MATLAB caller can pass meters/nm/µm; nk converts internally.
Pages with multiple DATA blocks (e.g., multiple thicknesses) produce multiple CSVs. Per-entry thickness is inferred where possible.
Some families (e.g., ITO) live under umbrella groups in RII. The MATLAB function augments these at runtime (e.g., synthetic materials.ito built from mixed_crystals).

2) Manually adding a new dataset

You can add lab data or literature without re-running the scraper.

2.1 Prepare the CSV

Create a file in nk-data/data/ with this exact header:

wavelength_um,n,k

Wavelengths: in µm, strictly increasing.
If you only have n, leave the k column blank. The MATLAB function can treat missing k as zero with opts.allow_k_zero = true.
Suggested filenames: <material>.<page>.<tag>.csv, e.g.:

au.smith-2025.d1.csv
sio2.custom-lab-2025.csv

2.2 Add a source entry to `catalog.json`

Open nk-data/catalog/catalog.json. Find the target material under materials. If it doesn’t exist, add a new object for it. Append a source record:

{
  "source_id": "au.smith-2025.d1",
  "kind": "thinfilm",                      // or "bulk"
  "path": "data/au.smith-2025.d1.csv",     // relative to nk-data/
  "valid_wavelength_um": [0.4, 1.1],       // µm
  "valid_thickness_nm": [100, 100],        // [tmin,tmax] in nm, or null for bulk
  "priority": 1,                           // not used by nk; keep 1
  "year": 2025,                            // tie-breaker (newer wins)
  "ref_string": "Smith et al., Opt. Lett. (2025) ...",
  "notes": "Ellipsometry, sputtered Au on glass, room temp.",
  "shelf": "lab",
  "book": "Au",
  "page": "smith-2025-d1",
  "has_k": true
}

Material block (simplified):

"au": {
  "aliases": ["Au","gold"],
  "default_source_id": null,
  "sources": [
    { ... },                               // existing sources
    {
      "source_id": "au.smith-2025.d1",
      "kind": "thinfilm",
      "path": "data/au.smith-2025.d1.csv",
      "valid_wavelength_um": [0.4, 1.1],
      "valid_thickness_nm": [100, 100],
      "priority": 1,
      "year": 2025,
      "ref_string": "Smith et al., Opt. Lett. (2025) ...",
      "notes": "Ellipsometry, sputtered Au on glass, room temp.",
      "shelf": "lab",
      "book": "Au",
      "page": "smith-2025-d1",
      "has_k": true
    }
  ]
}

Checksums (optional, for info.data_checksum):

$h = (Get-FileHash .\nk-data\data\au.smith-2025.d1.csv -Algorithm SHA256).Hash.ToLower()

Add to the "checksums" object:

"data/au.smith-2025.d1.csv": "sha256-<paste the hash here>"

2.3 Field rules (quick)

source_id: unique; recommend <material>.<page>[.dN]
kind: "thinfilm" if thickness known, else "bulk"
valid_wavelength_um: matches your CSV’s min/max
valid_thickness_nm: [t,t] for single thickness; null for bulk/unknown
year: 4-digit year (tie-breaker)
has_k: true if the k column exists (even if some rows are blank)

3) Sanity checks in MATLAB

Point nk at your database if it isn’t next to nk.m:

setenv('NK_DATA_ROOT', 'C:\path\to\project\nk-data')

Examples:

% Thin-film preference (e.g., Ag at 532 nm, 100 nm film)
[n,k,info] = nk(532e-9, 'ag', struct('units','m','thickness_nm',100));
disp(info.source_id); disp(info.selection_reason);

% Bulk fallback (thickness > 500 nm)
[n2,k2,info2] = nk(780e-9, 'au', struct('units','m','thickness_nm',600));

% Legacy names and inline paths
[n3,k3,info3] = nk(532e-9, 'fused silica', struct('units','m'));  % → SiO2
[n4,k4,info4] = nk(532e-9, 'BK7', struct('units','m'));          % inline Sellmeier
[n5,k5,info5] = nk(532e-9, 'ITO', struct('units','m'));          % ITO aggregator

Expected behavior:

For thickness ≤ 500 nm, thin-film datasets are preferred (closest thickness).
Tie-breaks: has k → newer year → wider λ-span → source_id.
Legacy names (e.g., water, fused silica, diamond, air, ITO) resolve.
BK7 uses the inline Sellmeier to preserve legacy results.

4) Troubleshooting

“No source covers …” – The wavelength must be inside a source’s [λmin, λmax]. The error lists the top partial overlaps; choose another wavelength or add a source.
“CSV missing column ‘…’” – The CSV header must be exactly wavelength_um,n,k. Units are µm.
“Chosen source lacks k-values” – Either add k to the CSV or call with opts.allow_k_zero = true.
Legacy name not found – Add it to the alias table in nk.m (function local_alias_map()). For umbrella cases (e.g., ITO), the function augments the catalog at runtime.
Default units – If old code passed meters, set the default in local_parse_opts:

'units','m'   % instead of 'um'

5) Conventions & credits

Wavelength unit: all CSVs store wavelengths in µm. Callers can use opts.units ('m'/'nm'/'um').
Thickness unit: nm in the catalog.
Credit: Data derived from refractiveindex.info; cite the original publications listed in ref_string.

6) One-line rebuild (for future you)

$DBROOT = "C:\...\rii-db\database"
python .\rii2catalog.py --db-root "$DBROOT" --out-root .\nk-data --log-sampling `
  --shelves data/main data/organic data/glass data/other data/sopra data/3d

Ripping data from refractiveindex.info for nk function

Contents

README — Building and Maintaining the nk-data Database

0) Expected layout

1) Rebuild nk-data from refractiveindex.info

Prerequisites

Get the RII database

Run the converter

2) Manually adding a new dataset

2.1 Prepare the CSV

2.2 Add a source entry to `catalog.json`

2.3 Field rules (quick)

3) Sanity checks in MATLAB

4) Troubleshooting

5) Conventions & credits

6) One-line rebuild (for future you)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Research Themes

Opportunities

Facilities

Conferences and Seminars

Publications

External Links

Internal Links

Tools

Ripping data from refractiveindex.info for nk function

Contents

README — Building and Maintaining the nk-data Database

0) Expected layout

1) Rebuild nk-data from refractiveindex.info

Prerequisites

Get the RII database

Run the converter

2) Manually adding a new dataset

2.1 Prepare the CSV

2.2 Add a source entry to catalog.json

2.3 Field rules (quick)

3) Sanity checks in MATLAB

4) Troubleshooting

5) Conventions & credits

6) One-line rebuild (for future you)

Navigation menu

Search

2.2 Add a source entry to `catalog.json`