YADM dataset

Intro / scope

The YADM dataset is a standardized export of repeated “You and Decision Making” sessions (cohorts) used to validate analytics automation and power the analytics dashboard.

Exact reproduction spec (read with `Desktop/YADM_Data/Analytics/_build-exports.mjs`)

Single source of truth for byte-identical output is the script; this block lists rules in pipeline order so the current state can be re-derived without spelunking.

Inputs: every *.json in YADM_Data/ (not dotfiles), sorted by filename → one logical session per file.
Constants: ORIGIN = https://imd.sylva.ac; CSV = UTF‑8 BOM + CRLF; escapeCsvCell strips/replaces embedded newlines in all cells.
IDs: userID / projectID = first 6 chars of URL‑safe base64(SHA‑256(kind + ':' + raw)).
“Latest” session (canonical poll titles + layout): the session with greatest activityMaxMs; tie‑break by greater sessionDateMs.
getMeta(nameKey): prefer latest session’s answerNameToMeta; else the longest header among sessions for that key.
pollSeq: latest.indices.pollOrder, then append polls that appear only in older files; if an extra poll is Calories or Butterfly and Trees exists, insert it immediately after Trees, then append remaining extras.
Raw answer column order: walk pollSeq; for each poll, ordered keys from latest then other sessions; Estimate poll keeps only keys whose prompts belong to the first 15 distinct prompts (unique prompt text); drop keys never answered in any session; always append butterfly calories keys from collectButterflyNameKeysForExport if missing (may add empty columns).
Reorder answers: reorderAnswersTreesButterflyTrust → non‑trust columns split trees vs butterfly via classifyTreesButterfly(prompt); then trust columns; output trees | butterfly | trust.
Enrichment (mutates meta): enrichEstimatePromptsPerKey → enrichEstimatePromptsBySlot → enrichRailroadFerrariPrompts → applyEstimateBoundsOverrides → applyTreesButterflyOverrides (needs butterfly linked‑number map).
Tail moves: reorderAnswersTailMoves — “Compared to your peers…” → second‑last answer column; Decision types… → last.
Trust columns: first trust poll’s first two answer widgets moved to answer indices 2–3 (after column 1).
Layout tweak: answer columns 28–32 (1‑based, answer columns only) moved to after column 7 (reorderAnswersMoveCols28Through32AfterCol7).
Short names: buildPollShortNames(pollSeq); buildAnswerShortNames → Estimate‑N, Trees‑N (title Trees), Butterfly‑N (title Calories/Butterfly), else PollShort‑k / suffixed; append ‑legacy if nameKey ∉ latest answerOrder.
Scores column order: Trust, Trust 2, DeltaTrust, then metrics in fixed metric key order mapped to latest poll titles: Calibration (score_ab), Ferrari, Cards, Railroad, Anchoring, Estimate, then TotalScore, Performance. Trust score column uses parsed first‑trust answer; Trust 2 / DeltaTrust from last‑trust answer keys; other cells from scoresByMetric.
Rounding: numeric scores and derived fields → 2 decimal places; empty string if missing / non‑finite.
Row filters: Scores / Participants — named user (first+last) and ≥1 non‑empty score after fill; Answers — same users but row emitted only if ≥1 non‑empty answer; History — all sessions; sessionDate = calendar day of earliest activity timestamp (never publishedAt); participantCount = roster namedUserCount if set else counted named participants with any answer.
Outputs: answers.csv uses five header rows + data row 6; meta.csv = scalar row + blank line + answersQuestionCatalog table; YADM_Data.xlsx renames reserved sheet History → Session_history; zip YAMD_Data.zip = five CSVs.

Participants columns = firstName, lastName, then scoreColsOrdered identical to Scores, then reportLink, sessionLabel.

Processing stages in code

The exporter (_build-exports.mjs) runs main() in twelve commented stages: load sessions → pick latest → pollSeq → build answer column order → trees/butterfly/trust reorder → enrich prompts → tail moves → trust columns + block move → short names & catalog → score column plan → emit rows → write files / XLSX / zip. Match those comments to the numbered rules above.

Source: manually downloaded Results JSON exports (one JSON per cohort session) from the Sylva admin Results view.
Output location on local machine: YADM_Data/Analytics/ (CSV/JSON/JS artifacts and YAMD_Data.zip bundle).
Sessions: 30 sessions in the current folder set.
Participants (filtered): 994 (named participants with at least one score; see rules below).
Polls / scores (current canonical set): Trust, Trust 2, Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate.
Derived score fields: DeltaTrust and TotalScore (see Scores section).
Identifiers: userID and projectID in Answers and Scores are exported as stable 6‑character hashes (deterministic per input).

Data structure (the five exports)

We produce five datasets for the same source sessions. Each dataset is exported as:

CSV: for spreadsheet use and dashboard ingestion
JSON: canonical structured form for programmatic checks

The five datasets are:

Answers
Scores
Participants
History
Meta

Answers

What it is

One row per participant per session, containing the normalized answers to the YADM flow questions.

Shape

Rows: participants who have at least one answer recorded for the session.
Dropped columns: any question column that has no non-empty answers in any session is omitted entirely (including polls such as Decision-style when unused).
Columns (prefix): Index, userID, projectID
Columns (answers): ordered chiefly by the latest-session flow (with tweaks below). The Answers sheet uses five header rows — participant rows begin on row 6 (Index, userID, projectID, then values under each Short_name from row 2):
Column A: labels each metadata row (Index, Short_name, Prompt, Answer_type); columns B–C are left blank on rows 1–4 so participant metadata does not sit under the wrong row labels.
Row 1 — numeric index: 1…N for answer columns only (after the three prefix columns).
Row 2 — Short_name: compact names; Estimate, Trees, and Butterfly use Estimate-1…, Trees-1…, Butterfly-1… in export order (no redundant middle segment like Trees-1-1). Other polls use Poll-1, Poll-2, … when a poll has multiple widgets. Columns only present in older cohort JSON get a ‑legacy suffix (e.g. Estimate-41-legacy).
Row 3 — Prompt: full question/task text; for Estimate number inputs, (Lower) or (Upper) is appended to the paired statement (first column of a pair = Lower, second = Upper).
Row 4 — Answer_type: best-effort widget type + options/range (sliders omit step size — it varies by client and can mislead readers).
Row 5 — column keys: Index, userID, projectID in A–C only; columns D onward are left blank (question keys stay on row 2). Separates the multi-row question headers from the data block.
Trust placement (answers): the first trust poll’s first two answer widgets (e.g. short names under that poll’s numbering) are forced to answer column indices 2 and 3 (immediately after the first answer column, typically traps). Separate trust polls (e.g. closing trust) keep their own columns later in the sheet.
Stable layout tweak: answer columns 28–32 (1-based among answer columns only) are moved to sit immediately after column 7 so Trees / Butterfly sit next to the early exercises for this export (column indices in answersQuestionCatalog follow this order).

Notable normalization rules

Flow alignment: we use the latest session as the canonical ordering, and append older/extra question variants if they are distinct.
Layout name collisions: answer fields are keyed internally by (item.name) + poll card title, because the same name (e.g. a reused …_numberinput_1) can appear on more than one poll; without this, a later card would steal the label from an earlier one.
Cards answer translation: selector choices are translated using the image filename mapping:
- choice1 → 5, choice2 → 8, choice3 → red, choice4 → blue
Estimate bounds: estimate questions are represented as two columns per question:
- (Lower) and (Upper)
- we include the current 5 questions, plus older distinct questions (when titles differ), capped at 15 total estimate questions.
- placeholders such as Q9 are replaced when possible with the longest non-placeholder prompt for that answer id across exports; if that still fails, we fill from the same ordinal Lower/Upper slot (paired estimate index) using the richest prompt found in any session for that slot.
- numberinputs are paired in flow order; an unmatched input at the end of one segment can pair with the next estimate input (carry) so prompts such as “Nile river” align with Lower/Upper pairs where possible.
Trees vs Butterfly (four column roles: two for Trees, at least two for Butterfly):
- Trees: Taller or shorter and Height estimate — classification looks for taller / shorter in the prompt (it does not treat the substring less inside shorter as “more/less”).
- Butterfly (legacy Calories card or Butterfly title): More or less and Number estimate — uses word-boundary more / less (or calories / apple), the Calories/Butterfly poll title, or a number input on the same poll card that follows a Butterfly direction question.
- Trees columns are output first, then Butterfly blocks. You may see more than two Butterfly headers when different cohorts used different direction-question wordings; each direction + number pair stays distinct.
Railroad / Ferrari: labels prefer the long question string from any session; if only stubs like Q6: exist, we scrape the longest instructional text block from the poll card in the JSON and fall back to the longest prompt seen anywhere for that poll.
No empty rows: participants with no answers in a session are excluded from answers.csv.
CSV safety: all headers/cells are sanitized to remove embedded newlines and written with a UTF‑8 BOM + CRLF line endings for Excel/Sheets compatibility.

Scores

What it is

One row per participant per session, containing the computed scores for the canonical YADM polls.

Shape

Rows: named participants with at least one score present.
Columns: userID, projectID, then score columns titled by poll title (exact order):
- Trust, Trust 2, DeltaTrust, Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate, TotalScore, Performance
- userID and projectID are stable 6‑character hashes (deterministic per input).

Normalization rules

Column titles: derived from the latest session’s poll titles, mapping older naming changes onto the latest names.
Order (after Trust / Trust 2 / DeltaTrust): fixed metric key order — Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate — then derived columns (see Shape); poll titles come from the latest session’s cards but column sequence is not the raw poll order.
At most one score per poll (per participant).
Rounding: scores are rounded to 2 decimal places.
Trust: numeric answer parsed from the first trust poll’s answer widget (if missing, leave empty).
Trust 2: numeric answer parsed from the last trust poll’s answer widget (opening/closing trust therefore stay distinct). (If missing, leave empty.)
DeltaTrust: (Trust2 - Trust) (empty if either is missing), placed immediately after Trust 2.
TotalScore: sum of all non-trust scores: Calibration + Ferrari + Cards + Railroad + Anchoring + Estimate (empty if none present; order matches the score columns).
Performance: average of present values among only Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate (same six as TotalScore; Trust and Trust 2 are not included). Empty if none of those six are present.
No empty rows: participants without any score are excluded from scores.csv.

Participants

What it is

A participant list for dashboard display: names, scores, and a report link placeholder, per session.

Shape

Rows: named participants with at least one score present.
Columns (order):
- firstName, lastName,
- the score columns (same titles and order as scores.csv — use scoreColsOrdered in the exporter),
- reportLink, sessionLabel

Rules

Exclude missing names: participants without first+last name are excluded.
Exclude missing scores: participants with all scores empty are excluded.
Report link: placeholder based on staff Results URL with a #participant=<userId> anchor.

History

What it is

A per-session list for dashboards and rollups.

Shape

Rows: one per session JSON file.
Columns: sessionName, sessionDate, participantCount, projectLink

Rules

Session date: day of the earliest activity timestamp found in that session (answer/load/submit timestamps). (We explicitly avoid using content publishedAt timestamps.)
Participant count: uses the named roster count for the session (so sessions with missing runtime data still have correct counts).

Building / refreshing the exports

The exporter lives at Desktop/YADM_Data/Analytics/_build-exports.mjs and reads session JSON files from the parent YADM_Data/ folder.

Run it from the Analytics directory:

cd ~/Desktop/YADM_Data/Analytics
node ./_build-exports.mjs

This regenerates:

answers.csv|json|js
scores.csv|json|js (includes TotalScore and Performance)
participants.csv|json|js
history.csv|json|js
meta.csv|json|js
~/Desktop/YADM_Data/Analytics/YADM_Data.xlsx (styled workbook; Answers freeze C×5, five header rows incl. column keys, then data)
~/Desktop/YADM_Data/Analytics/YAMD_Data.zip (CSV bundle, same folder as the CSVs)

Notable behaviors implemented in the exporter:

Trust / Trust 2 (scores): Trust comes from the first trust poll’s answer widget; Trust 2 comes from the last trust poll’s answer widget. If a participant didn’t answer, the cell is left empty (no computed fallback).
Trust answers: the first trust poll’s first two widgets are moved to indices 2 and 3 among answer columns (after the first question column).
DeltaTrust: only computed when both trust values are present.
Performance: average of non-trust poll scores only (Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate), over cells that are present.
Answer column ordering (before trust placement): Trees → Butterfly → Trust polls → everything else, grouped using classifyTreesButterfly on each column’s prompt/title (legacy behavior). Short_name still uses poll card titles for Trees-* / Butterfly-* only (Trees, Calories/Butterfly), so e.g. Ferrari stays Ferrari-* even if a prompt matches butterfly heuristics. Two explicit tail moves:
- Trust - Compared to your peers, … is forced to second last.
- Decision types - How much of your job … is forced to the very last column.
Ferrari label: prefers the short “how much / cost / price” instruction text from the Ferrari card over longer unrelated intro text when building the header.

YADM dataset

Intro / scope

Exact reproduction spec (read with `Desktop/YADM_Data/Analytics/_build-exports.mjs`)

Processing stages in code

Data structure (the five exports)

Answers

What it is

Shape

Notable normalization rules

Scores

What it is

Shape

Normalization rules

Participants

What it is

Shape

Rules

History

What it is

Shape

Rules

Meta

What it is

Fields

XLSX workbook

Building / refreshing the exports

YADM dataset

Intro / scope

Exact reproduction spec (read with Desktop/YADM_Data/Analytics/_build-exports.mjs)

Processing stages in code

Data structure (the five exports)

Answers

What it is

Shape

Notable normalization rules

Scores

What it is

Shape

Normalization rules

Participants

What it is

Shape

Rules

History

What it is

Shape

Rules

Meta

What it is

Fields

XLSX workbook

Building / refreshing the exports

Exact reproduction spec (read with `Desktop/YADM_Data/Analytics/_build-exports.mjs`)