Skip to content

YADM dataset

The YADM dataset is a standardized export of repeated “You and Decision Making” sessions (cohorts) used to validate analytics automation and power the analytics dashboard.

Exact reproduction spec (read with Desktop/YADM_Data/Analytics/_build-exports.mjs)

Section titled “Exact reproduction spec (read with Desktop/YADM_Data/Analytics/_build-exports.mjs)”

Single source of truth for byte-identical output is the script; this block lists rules in pipeline order so the current state can be re-derived without spelunking.

  1. Inputs: every *.json in YADM_Data/ (not dotfiles), sorted by filename → one logical session per file.
  2. Constants: ORIGIN = https://imd.sylva.ac; CSV = UTF‑8 BOM + CRLF; escapeCsvCell strips/replaces embedded newlines in all cells.
  3. IDs: userID / projectID = first 6 chars of URL‑safe base64(SHA‑256(kind + ':' + raw)).
  4. “Latest” session (canonical poll titles + layout): the session with greatest activityMaxMs; tie‑break by greater sessionDateMs.
  5. getMeta(nameKey): prefer latest session’s answerNameToMeta; else the longest header among sessions for that key.
  6. pollSeq: latest.indices.pollOrder, then append polls that appear only in older files; if an extra poll is Calories or Butterfly and Trees exists, insert it immediately after Trees, then append remaining extras.
  7. Raw answer column order: walk pollSeq; for each poll, ordered keys from latest then other sessions; Estimate poll keeps only keys whose prompts belong to the first 15 distinct prompts (unique prompt text); drop keys never answered in any session; always append butterfly calories keys from collectButterflyNameKeysForExport if missing (may add empty columns).
  8. Reorder answers: reorderAnswersTreesButterflyTrust → non‑trust columns split trees vs butterfly via classifyTreesButterfly(prompt); then trust columns; output trees | butterfly | trust.
  9. Enrichment (mutates meta): enrichEstimatePromptsPerKeyenrichEstimatePromptsBySlotenrichRailroadFerrariPromptsapplyEstimateBoundsOverridesapplyTreesButterflyOverrides (needs butterfly linked‑number map).
  10. Tail moves: reorderAnswersTailMoves“Compared to your peers…” → second‑last answer column; Decision types… → last.
  11. Trust columns: first trust poll’s first two answer widgets moved to answer indices 2–3 (after column 1).
  12. Layout tweak: answer columns 28–32 (1‑based, answer columns only) moved to after column 7 (reorderAnswersMoveCols28Through32AfterCol7).
  13. Short names: buildPollShortNames(pollSeq); buildAnswerShortNamesEstimate‑N, Trees‑N (title Trees), Butterfly‑N (title Calories/Butterfly), else PollShort‑k / suffixed; append ‑legacy if nameKey ∉ latest answerOrder.
  14. Scores column order: Trust, Trust 2, DeltaTrust, then metrics in fixed metric key order mapped to latest poll titles: Calibration (score_ab), Ferrari, Cards, Railroad, Anchoring, Estimate, then TotalScore, Performance. Trust score column uses parsed first‑trust answer; Trust 2 / DeltaTrust from last‑trust answer keys; other cells from scoresByMetric.
  15. Rounding: numeric scores and derived fields → 2 decimal places; empty string if missing / non‑finite.
  16. Row filters: Scores / Participants — named user (first+last) and ≥1 non‑empty score after fill; Answers — same users but row emitted only if ≥1 non‑empty answer; History — all sessions; sessionDate = calendar day of earliest activity timestamp (never publishedAt); participantCount = roster namedUserCount if set else counted named participants with any answer.
  17. Outputs: answers.csv uses five header rows + data row 6; meta.csv = scalar row + blank line + answersQuestionCatalog table; YADM_Data.xlsx renames reserved sheet HistorySession_history; zip YAMD_Data.zip = five CSVs.

Participants columns = firstName, lastName, then scoreColsOrdered identical to Scores, then reportLink, sessionLabel.

The exporter (_build-exports.mjs) runs main() in twelve commented stages: load sessions → pick latest → pollSeq → build answer column order → trees/butterfly/trust reorder → enrich prompts → tail moves → trust columns + block move → short names & catalog → score column plan → emit rows → write files / XLSX / zip. Match those comments to the numbered rules above.

  • Source: manually downloaded Results JSON exports (one JSON per cohort session) from the Sylva admin Results view.
  • Output location on local machine: YADM_Data/Analytics/ (CSV/JSON/JS artifacts and YAMD_Data.zip bundle).
  • Sessions: 30 sessions in the current folder set.
  • Participants (filtered): 994 (named participants with at least one score; see rules below).
  • Polls / scores (current canonical set): Trust, Trust 2, Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate.
  • Derived score fields: DeltaTrust and TotalScore (see Scores section).
  • Identifiers: userID and projectID in Answers and Scores are exported as stable 6‑character hashes (deterministic per input).

We produce five datasets for the same source sessions. Each dataset is exported as:

  • CSV: for spreadsheet use and dashboard ingestion
  • JSON: canonical structured form for programmatic checks

The five datasets are:

  1. Answers
  2. Scores
  3. Participants
  4. History
  5. Meta

One row per participant per session, containing the normalized answers to the YADM flow questions.

  • Rows: participants who have at least one answer recorded for the session.

  • Dropped columns: any question column that has no non-empty answers in any session is omitted entirely (including polls such as Decision-style when unused).

  • Columns (prefix): Index, userID, projectID

  • Columns (answers): ordered chiefly by the latest-session flow (with tweaks below). The Answers sheet uses five header rows — participant rows begin on row 6 (Index, userID, projectID, then values under each Short_name from row 2):

  • Column A: labels each metadata row (Index, Short_name, Prompt, Answer_type); columns B–C are left blank on rows 1–4 so participant metadata does not sit under the wrong row labels.

  • Row 1 — numeric index: 1…N for answer columns only (after the three prefix columns).

  • Row 2 — Short_name: compact names; Estimate, Trees, and Butterfly use Estimate-1, Trees-1, Butterfly-1 in export order (no redundant middle segment like Trees-1-1). Other polls use Poll-1, Poll-2, … when a poll has multiple widgets. Columns only present in older cohort JSON get a ‑legacy suffix (e.g. Estimate-41-legacy).

  • Row 3 — Prompt: full question/task text; for Estimate number inputs, (Lower) or (Upper) is appended to the paired statement (first column of a pair = Lower, second = Upper).

  • Row 4 — Answer_type: best-effort widget type + options/range (sliders omit step size — it varies by client and can mislead readers).

  • Row 5 — column keys: Index, userID, projectID in A–C only; columns D onward are left blank (question keys stay on row 2). Separates the multi-row question headers from the data block.

  • Trust placement (answers): the first trust poll’s first two answer widgets (e.g. short names under that poll’s numbering) are forced to answer column indices 2 and 3 (immediately after the first answer column, typically traps). Separate trust polls (e.g. closing trust) keep their own columns later in the sheet.

  • Stable layout tweak: answer columns 28–32 (1-based among answer columns only) are moved to sit immediately after column 7 so Trees / Butterfly sit next to the early exercises for this export (column indices in answersQuestionCatalog follow this order).

  • Flow alignment: we use the latest session as the canonical ordering, and append older/extra question variants if they are distinct.
  • Layout name collisions: answer fields are keyed internally by (item.name) + poll card title, because the same name (e.g. a reused …_numberinput_1) can appear on more than one poll; without this, a later card would steal the label from an earlier one.
  • Cards answer translation: selector choices are translated using the image filename mapping:
    • choice1 → 5, choice2 → 8, choice3 → red, choice4 → blue
  • Estimate bounds: estimate questions are represented as two columns per question:
    • (Lower) and (Upper)
    • we include the current 5 questions, plus older distinct questions (when titles differ), capped at 15 total estimate questions.
    • placeholders such as Q9 are replaced when possible with the longest non-placeholder prompt for that answer id across exports; if that still fails, we fill from the same ordinal Lower/Upper slot (paired estimate index) using the richest prompt found in any session for that slot.
    • numberinputs are paired in flow order; an unmatched input at the end of one segment can pair with the next estimate input (carry) so prompts such as “Nile river” align with Lower/Upper pairs where possible.
  • Trees vs Butterfly (four column roles: two for Trees, at least two for Butterfly):
    • Trees: Taller or shorter and Height estimate — classification looks for taller / shorter in the prompt (it does not treat the substring less inside shorter as “more/less”).
    • Butterfly (legacy Calories card or Butterfly title): More or less and Number estimate — uses word-boundary more / less (or calories / apple), the Calories/Butterfly poll title, or a number input on the same poll card that follows a Butterfly direction question.
    • Trees columns are output first, then Butterfly blocks. You may see more than two Butterfly headers when different cohorts used different direction-question wordings; each direction + number pair stays distinct.
  • Railroad / Ferrari: labels prefer the long question string from any session; if only stubs like Q6: exist, we scrape the longest instructional text block from the poll card in the JSON and fall back to the longest prompt seen anywhere for that poll.
  • No empty rows: participants with no answers in a session are excluded from answers.csv.
  • CSV safety: all headers/cells are sanitized to remove embedded newlines and written with a UTF‑8 BOM + CRLF line endings for Excel/Sheets compatibility.

One row per participant per session, containing the computed scores for the canonical YADM polls.

  • Rows: named participants with at least one score present.
  • Columns: userID, projectID, then score columns titled by poll title (exact order):
    • Trust, Trust 2, DeltaTrust, Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate, TotalScore, Performance
    • userID and projectID are stable 6‑character hashes (deterministic per input).
  • Column titles: derived from the latest session’s poll titles, mapping older naming changes onto the latest names.
  • Order (after Trust / Trust 2 / DeltaTrust): fixed metric key order — Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate — then derived columns (see Shape); poll titles come from the latest session’s cards but column sequence is not the raw poll order.
  • At most one score per poll (per participant).
  • Rounding: scores are rounded to 2 decimal places.
  • Trust: numeric answer parsed from the first trust poll’s answer widget (if missing, leave empty).
  • Trust 2: numeric answer parsed from the last trust poll’s answer widget (opening/closing trust therefore stay distinct). (If missing, leave empty.)
  • DeltaTrust: (Trust2 - Trust) (empty if either is missing), placed immediately after Trust 2.
  • TotalScore: sum of all non-trust scores: Calibration + Ferrari + Cards + Railroad + Anchoring + Estimate (empty if none present; order matches the score columns).
  • Performance: average of present values among only Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate (same six as TotalScore; Trust and Trust 2 are not included). Empty if none of those six are present.
  • No empty rows: participants without any score are excluded from scores.csv.

A participant list for dashboard display: names, scores, and a report link placeholder, per session.

  • Rows: named participants with at least one score present.
  • Columns (order):
    • firstName, lastName,
    • the score columns (same titles and order as scores.csv — use scoreColsOrdered in the exporter),
    • reportLink, sessionLabel
  • Exclude missing names: participants without first+last name are excluded.
  • Exclude missing scores: participants with all scores empty are excluded.
  • Report link: placeholder based on staff Results URL with a #participant=<userId> anchor.

A per-session list for dashboards and rollups.

  • Rows: one per session JSON file.
  • Columns: sessionName, sessionDate, participantCount, projectLink
  • Session date: day of the earliest activity timestamp found in that session (answer/load/submit timestamps). (We explicitly avoid using content publishedAt timestamps.)
  • Participant count: uses the named roster count for the session (so sessions with missing runtime data still have correct counts).

A compact summary for dashboards.

  • totalParticipants: number of rows in the Participants dataset
  • firstSessionDate, lastSessionDate
  • numberOfAnswersRows, numberOfAnswersColumns
  • numberOfScoresRows, numberOfScoresColumns
  • answersQuestionCatalog: array describing each answer column (same content as answers sheet rows 1–4 for columns D onward, transposed): column_index, short_name, prompt, answer_type. In meta.csv this appears after a blank line as a second table (column_index, …).
  • YADM_Data.xlsx is built with ExcelJS (install: npm i exceljs in the Analytics folder).
  • Answers sheet: freeze panes after column C and row 5 (first data cell D6); rows 1–5 use light grid borders with a stronger bottom border under row 5; top-aligned, wrap text; column widths tuned for labels vs answers.
  • Meta sheet: scalar metrics as a two-column table; then a merged title row Answers overview, then the same answersQuestionCatalog columns as the second table in meta.csv (column_index, short_name, prompt, answer_type).
  • Other sheets: freeze the header row, bold header, bottom border, wrapped top-aligned cells.
  • Excel reserves the worksheet name History — that sheet is named Session_history in the workbook only (CSV/JSON filenames unchanged).

The exporter lives at Desktop/YADM_Data/Analytics/_build-exports.mjs and reads session JSON files from the parent YADM_Data/ folder.

Run it from the Analytics directory:

Terminal window
cd ~/Desktop/YADM_Data/Analytics
node ./_build-exports.mjs

This regenerates:

  • answers.csv|json|js
  • scores.csv|json|js (includes TotalScore and Performance)
  • participants.csv|json|js
  • history.csv|json|js
  • meta.csv|json|js
  • ~/Desktop/YADM_Data/Analytics/YADM_Data.xlsx (styled workbook; Answers freeze C×5, five header rows incl. column keys, then data)
  • ~/Desktop/YADM_Data/Analytics/YAMD_Data.zip (CSV bundle, same folder as the CSVs)

Notable behaviors implemented in the exporter:

  • Trust / Trust 2 (scores): Trust comes from the first trust poll’s answer widget; Trust 2 comes from the last trust poll’s answer widget. If a participant didn’t answer, the cell is left empty (no computed fallback).
  • Trust answers: the first trust poll’s first two widgets are moved to indices 2 and 3 among answer columns (after the first question column).
  • DeltaTrust: only computed when both trust values are present.
  • Performance: average of non-trust poll scores only (Calibration, Ferrari, Cards, Railroad, Anchoring, Estimate), over cells that are present.
  • Answer column ordering (before trust placement): Trees → Butterfly → Trust polls → everything else, grouped using classifyTreesButterfly on each column’s prompt/title (legacy behavior). Short_name still uses poll card titles for Trees-* / Butterfly-* only (Trees, Calories/Butterfly), so e.g. Ferrari stays Ferrari-* even if a prompt matches butterfly heuristics. Two explicit tail moves:
    • Trust - Compared to your peers, … is forced to second last.
    • Decision types - How much of your job … is forced to the very last column.
  • Ferrari label: prefers the short “how much / cost / price” instruction text from the Ferrari card over longer unrelated intro text when building the header.