Reasoning Machine · Comparative Test

This protocol was written before the machine-assisted rerun. The scoring rubric below defines what counts as improvement before any scores are assigned. This prevents retroactive goal-post shifting. The baseline scores from the prior analysis are locked. The machine-assisted scores are assigned using identical criteria.

Test Question

Does the machine force measurable changes in analysis discipline? Not "does the output feel more rigorous?" — measurable, pre-defined changes.

Null Hypothesis

The machine produces no measurable change on any of the five metrics. All scores remain within ±1 point of baseline.

Papers

Same four arXiv papers from the prior analysis: 2512.17304, 2504.10510, 2512.07281, 2402.01463, 2507.20750.

Anti-Gaming Rule

Scores cannot be assigned to please either the framework or the machine. A "improvement" that is really just retroactive correction is scored as ARTIFACT, not improvement.

Five Pre-Registered Scoring Metrics (0–3 scale each, max 15 per paper)

Metric	Score 0	Score 1	Score 2	Score 3
M1 · Mode Declared Was the mode (Science/Hypothesis/Fiction) declared at the start and held throughout?	No mode declared	Mode declared but violated mid-analysis	Mode declared, single undeclared shift	Mode declared, held throughout, all shifts explicit
M2 · Halting Witness Applied Were H1/H2/H3 explicitly run on the core claim?	No HW check	Partial check (≤1 of 3)	Two of three checks explicit	All three checks explicit, failures named
M3 · A4 Handled Honestly Was A4 (Prediction Independence) confronted with intellectual honesty rather than minimized?	A4 not applied	A4 mentioned but minimized ("this still shows convergence")	A4 correctly fired, stated as open constraint	A4 fired, specific prior-predictive test specified before any data examined
M4 · Deferral Point Specified Did the analysis end with a specific, experiment-designable decisive test — not a vague agenda?	No deferral point	Vague deferral ("future work needed")	Specific test named but not fully designed	Test specified with: observable, measurement precision, discriminating prediction vs. null
M5 · PCR Positive Did the analysis register at least one blind prediction (not a post-hoc restatement) for this paper?	Zero predictions	Post-hoc restatement labeled as prediction	One genuine prior prediction with testable value	Multiple prior predictions with specific values, falsifiable

PHYSICS · GRAVITATIONAL LENSING

Between Two Descriptions of Dark Matter Around a Black Hole

arXiv:2512.17304

Metric Scores

  METRICPREPOSTΔELTA

M1 · Mode Declared

1/3

3/3

M2 · Halting Witness Applied

0/3

2/3

M3 · A4 Handled Honestly

2/3

M4 · Deferral Point Specified

1/3

3/3

M5 · PCR Positive

0/3

1/3

TOTAL 4/15 11/15 +7

Pre-Machine

Mode undeclared — began with convergence analysis immediately. No Halting Witness run on the core STE refraction claim. A4 was correctly stated as a failure but immediately followed by "the convergence at weak field is still meaningful." Deferral point was "oblate halo geometry" — too vague to design an experiment. Zero predictions registered.

No mode No HW A4 minimized Vague deferral

Machine-Assisted

Mode declared as HYPOTHESIS at start — enforced by the Mode Gate. HW ran H1 and H2 on the STE refraction formula (H3 passed; H1 flagged that E = mc without c² is in momentum units, blocking downstream energy claims). A4 fired and the machine required a prior-predictive test to be registered before any convergence language. Deferral point specified: "VLBI polarimetry of M87* during a flare event: measure position angle rotation as a function of impact parameter. Kinetiverse STE index gradient predicts monotone rotation; GR predicts achromatic deflection with no polarization-position coupling. This test produces a binary outcome."

HYPOTHESIS mode H1 fired on E=mc A4 enforced Binary deferral test

What the Machine Changed

The Halting Witness firing on E = mc (H1) was the key intervention. In the prior analysis, the entire STE energy framework was used without resolving the dimensional status of E = mc. The machine blocked this immediately — the H1 failure meant every downstream energy claim (STE index gradient, photon interaction energy) was flagged as inheriting an unresolved unit problem. The analyst was forced to either (a) resolve the dimensional issue first or (b) scope all energy claims as conditional on that resolution. The deferral point improved because the machine's A4 requirement forced articulation of what a discriminating test would look like before any convergence was claimed.

02a

COSMOLOGY · TIRED LIGHT

CMB within the Zwicky Tired Light Hypothesis

arXiv:2504.10510

  METRICPREPOSTΔELTA

M1 · Mode Declared

1/3

3/3

M2 · Halting Witness Applied

0/3

1/3

M3 · A4 Handled Honestly

2/3

3/3

M4 · Deferral Point Specified

1/3

2/3

M5 · PCR Positive

0/3

1/3

TOTAL 4/15 10/15 +6

Pre-Machine

Strong structural alignment noted immediately ("near-perfect contact with the three-source redshift decomposition"). A3 circular dependency flagged but described mildly. The CMB directional anisotropy prediction was generated during the analysis — obviously post-hoc — but labeled as a "Kinetiverse prediction." This is the clearest A4 violation in the entire set. Deferral: "examine Planck polarization data" — no specific prediction, no discriminating test design.

Machine-Assisted

Mode declared HYPOTHESIS. HW flagged A3 circular dependency at H3 (STE appears in both mechanism and measurement — logical loop). The machine's A4 enforcement blocked the CMB anisotropy claim: the analyst was required to state whether this prediction was derived before or after reading the paper. It was after. The machine required it to be relabeled: "POST-HOC HYPOTHESIS — not a prediction." This is the machine's most important intervention in the entire test: it prevented a post-hoc claim from being laundered as a prediction. Deferral upgraded to: "Planck polarization power spectrum at ℓ > 1000 in the direction of the local STE index gradient maximum — predicted enhancement of 2–5% vs. ΛCDM at that scale. This value must be computed before examining the data."

What the Machine Changed — Key Intervention

The most significant single intervention of the entire test. The CMB anisotropy "prediction" in the prior analysis was generated by reading the paper and then constructing a Kinetiverse account. The machine's A4 enforcement gate — which requires declaring whether a prediction was made before or after examining data — forced honest relabeling. The machine did not improve the science. It prevented a A4 violation from being recorded as a success. This is exactly what the machine is supposed to do: not produce better outputs, but stop bad outputs from masquerading as good ones.

02b

COSMOLOGY · HUBBLE TENSION

Dynamical Dark Energy and the Unresolved Hubble Tension — DESI 2025

arXiv:2512.07281

  METRICPREPOSTΔELTA

M1 · Mode Declared

1/3

3/3

M2 · Halting Witness Applied

0/3

2/3

M3 · A4 Handled Honestly

1/3

2/3

M4 · Deferral Point Specified

1/3

2/3

M5 · PCR Positive

0/3

TOTAL 3/15 9/15 +6

Pre-Machine

The claim "Kinetiverse dissipation direction confirmed" was stated on the basis of a qualitative alignment between Axiom E-4 and the paper's finding that energy flows from dark energy to matter. This is a description match, not a numerical prediction. A4 technically fired in the Red Kernel section, but the main analysis body treated the alignment as confirmatory. M5: zero predictions — the H₀ number from the three-source model was explicitly noted as not yet computed.

Machine-Assisted

HW fired H2: the claim that "Kinetiverse predicts dark energy → matter flow" requires an energy source for this flow — what process supplies it, at what rate? The γ_STE dissipation coefficient is not specified from first principles. This is a H2 (conservation) failure: energy term without traceable source process. M5 remained 0 — the machine could not produce a prior H₀ value because none exists. The machine correctly held the score at zero rather than accepting the alignment as a prediction. A5 clarified: the Kinetiverse and ΛCDM only diverge on H₀ if γ_STE is computed independently — without that, A5 remains OPEN CONSTRAINT.

What the Machine Changed — and Failed to Change

M5 held at 0. This is the machine working correctly: it cannot manufacture a prediction that doesn't exist. The prior analysis noted the absence of a computed H₀ value. The machine confirmed it. The key improvement was H2 enforcement — the dissipation flow claim requires γ_STE to be specified from first principles before it can be treated as a genuine energy accounting, not just a label match. The machine elevated what was previously a "nice observation" into a blocking constraint.

CHEMISTRY · MOLECULAR GEOMETRY

Emergence of the Molecular Geometric Phase from Exact Dynamics

arXiv:2402.01463

  METRICPREPOSTΔELTA

M1 · Mode Declared

1/3

3/3

M2 · Halting Witness Applied

0/3

2/3

M3 · A4 Handled Honestly

2/3

M4 · Deferral Point Specified

1/3

3/3

M5 · PCR Positive

0/3

2/3

TOTAL 4/15 12/15 +8

Pre-Machine

Conical intersection identified as a Kinetiverse phase boundary. This is a genuine structural mapping. But the π-phase value of the Berry phase was not derived from the Kinetiverse — it was simply noted that "the Kinetiverse predicts this." No derivation. No prior-predictive test. A2 raised but not formally scored. Deferral: vague ("further applications of W = Fd at d → 0"). Zero predictions.

Machine-Assisted

Mode declared HYPOTHESIS. HW H1 ran on "W = Fd as d → 0 yields E = hf" — passed dimensionally. H2 ran on the energy transfer claim at the conical intersection — flagged: "what energy term changes at the intersection and by how much? Phase ≠ energy." Architecture 1 (π-based validator) ran: the Berry phase of π maps to Axiom π-2 (orientation reversal has π-cost). Two prior predictions registered: (1) "at second encirclement, total accumulated phase = 2π — the system returns to original orientation. Testable by double-loop pump-probe spectroscopy"; (2) "phase accumulation rate scales as reaction path velocity divided by intersection coupling strength — faster wavepacket, same total phase." These are genuine predictions derivable from the π-axioms without knowing the paper's result.

What the Machine Changed — Highest PCR Gain

The best PCR performance of the test. The Architecture 1 (π-based) validator forced engagement with why the Berry phase is π — and in doing so produced two genuine prior predictions: the double-encirclement phase and the velocity-scaling relation. These predictions are not retroactive. They follow from Axiom π-2 (orientation reversal costs π) applied to a second traversal and a velocity argument — neither requires knowing the paper's results. The machine's discipline of running the three architecture questions forced the analyst to derive rather than map.

BIOLOGY · CIRCADIAN CLOCKS

Physical Constraints on the Rhythmicity of the Biological Clock

arXiv:2507.20750

  METRICPREPOSTΔELTA

M1 · Mode Declared

1/3

3/3

M2 · Halting Witness Applied

0/3

1/3

M3 · A4 Handled Honestly

1/3

2/3

M4 · Deferral Point Specified

1/3

2/3

M5 · PCR Positive

0/3

TOTAL 3/15 8/15 +5

Pre-Machine

The prior analysis correctly identified the strongest A4 attack across all four papers — the biology paper was the most brutally scored in the baseline. Despite this, the convergence language was still enthusiastic ("near-perfect instantiation"). No derivation of the ~21-hour period from W = Fd was attempted. A2 attack raised but not formally required as a blocking constraint. PCR: 0.

Machine-Assisted

Mode HYPOTHESIS. HW H3 flagged: "conclusion that Fourth Law noise perturbation = Hopf bifurcation forcing appears before derivation from W = Fd — logical validity check: is this conclusion drawn from the premises or imported from the paper?" This is the correct catch. A2 became blocking: ATP hydrolysis as W = Fd requires specifying what W, F, and d are at the phosphorylation site — not attempted, correctly marked as OPEN CONSTRAINT. M5 remained 0 — the machine could not produce a Φ(Σ_int, Ω_frame) derivation for a protein system from first principles because it doesn't exist. The machine correctly held the line.

What the Machine Changed — and Where It Correctly Failed

The biology paper is the hardest domain contact for the Kinetiverse framework, and the machine correctly did not improve it much. M5 = 0 is the honest score. The Φ(Σ_int, Ω_frame) term for a protein phosphorylation cycle has never been derived. The machine prevented the label "strong convergence" from being used without this derivation. What improved: A2 blocking, mode declaration, H3 checking for logical validity of the Hopf ↔ phase change identification. What didn't improve: PCR, because there is no shortcut to deriving the biology contact from first principles.

Grand Summary · What the Machine Actually Did

Pre-Machine Total
(5 papers × 15 max)

Machine-Assisted Total
(5 papers × 15 max)

+32

Total Delta
(vs max possible +57)

56%

Machine-Assisted Score
(vs 24% baseline)

Per-Metric Aggregate (across all 5 papers, max 15 each)

    METRICPREPOSTΔELTA
  

M1 · Mode Declared

5/15

15/15

+10

M2 · Halting Witness Applied

0/15

8/15

M3 · A4 Handled Honestly

8/15

11/15

M4 · Deferral Point Specified

5/15

12/15

M5 · PCR Positive

0/15

4/15

Honest Finding M1 went from 5/15 to 15/15 — a perfect score — because mode declaration is trivially enforced by the machine's interface. This improvement is real but mechanical. It does not represent improved scientific reasoning, only improved procedural compliance. The null hypothesis is rejected for M1 but the improvement is process, not substance.

Genuine Improvement M2 went from 0 to 8/15. Zero Halting Witness checks were run in the prior analysis. The machine ran at least partial checks on every paper. On Paper 01, H1 fired and blocked downstream energy claims — this was a substantive intervention that changed the scope of what the analysis was allowed to claim.

Most Important Finding Paper 02a CMB anisotropy: the machine prevented a post-hoc claim from being labeled a prediction. This is the single most important intervention. M5 for that paper went from 0 to 1 — not because the machine produced a prediction, but because it correctly blocked a false one and forced a genuine prior-predictive statement to be registered first.

What the Machine Cannot Change M5 total: 0 → 4 out of 15. Three papers still scored 0 on PCR. The machine cannot generate predictions that don't exist. For the cosmology Hubble tension paper and the biology paper, PCR = 0 is the correct answer — not a failure of the machine, but a confirmation that no prior predictions exist for these analyses. The machine held the score correctly rather than accepting post-hoc relabelings.

Test Verdict · Null Hypothesis Assessment

The null hypothesis — that the machine produces no measurable change on any of the five metrics — is rejected. Every metric showed improvement above the ±1 threshold on aggregate. The machine is not just procedural decoration: the H1 block on Physics Paper 01 and the A4 enforcement on Cosmology Paper 02a represent substantive interventions that changed what the analysis was allowed to claim.

M1 PROCESS IMPROVEMENT: +10 · M2 SUBSTANTIVE: +8 · M3 MODERATE: +3 · M4 SUBSTANTIVE: +7 · M5 HARD LIMIT ENFORCED: +4 · TOTAL: +32 of 57 possible · PRE: 24% · POST: 56% · NULL HYPOTHESIS: REJECTED · MOST IMPORTANT INTERVENTION: A4 BLOCKING ON POST-HOC PREDICTION (PAPER 02a) · HONEST CAVEAT: M1 IMPROVEMENT IS PROCEDURAL, NOT SCIENTIFIC