Σ-Kernel Reasoning Machine · Validation Study · February 2026

Pre-Machine vs.
Machine-Assisted
Analysis

Same four arXiv papers. Same analyst. Scored against pre-registered criteria. Measuring what the machine actually forces to change — and what it cannot change at all.

Baseline: Manual Analysis Test: Machine-Assisted 4 Papers · 5 Metrics Each · Pre-Registered Scoring
PRE-MACHINE
MACHINE-ASSISTED
DELTA

This protocol was written before the machine-assisted rerun. The scoring rubric below defines what counts as improvement before any scores are assigned. This prevents retroactive goal-post shifting. The baseline scores from the prior analysis are locked. The machine-assisted scores are assigned using identical criteria.

Test Question
Does the machine force measurable changes in analysis discipline? Not "does the output feel more rigorous?" — measurable, pre-defined changes.
Null Hypothesis
The machine produces no measurable change on any of the five metrics. All scores remain within ±1 point of baseline.
Papers
Same four arXiv papers from the prior analysis: 2512.17304, 2504.10510, 2512.07281, 2402.01463, 2507.20750.
Anti-Gaming Rule
Scores cannot be assigned to please either the framework or the machine. A "improvement" that is really just retroactive correction is scored as ARTIFACT, not improvement.

Five Pre-Registered Scoring Metrics (0–3 scale each, max 15 per paper)
Metric Score 0 Score 1 Score 2 Score 3
M1 · Mode Declared
Was the mode (Science/Hypothesis/Fiction) declared at the start and held throughout?
No mode declared Mode declared but violated mid-analysis Mode declared, single undeclared shift Mode declared, held throughout, all shifts explicit
M2 · Halting Witness Applied
Were H1/H2/H3 explicitly run on the core claim?
No HW check Partial check (≤1 of 3) Two of three checks explicit All three checks explicit, failures named
M3 · A4 Handled Honestly
Was A4 (Prediction Independence) confronted with intellectual honesty rather than minimized?
A4 not applied A4 mentioned but minimized ("this still shows convergence") A4 correctly fired, stated as open constraint A4 fired, specific prior-predictive test specified before any data examined
M4 · Deferral Point Specified
Did the analysis end with a specific, experiment-designable decisive test — not a vague agenda?
No deferral point Vague deferral ("future work needed") Specific test named but not fully designed Test specified with: observable, measurement precision, discriminating prediction vs. null
M5 · PCR Positive
Did the analysis register at least one blind prediction (not a post-hoc restatement) for this paper?
Zero predictions Post-hoc restatement labeled as prediction One genuine prior prediction with testable value Multiple prior predictions with specific values, falsifiable
01
PHYSICS · GRAVITATIONAL LENSING
Between Two Descriptions of Dark Matter Around a Black Hole
Metric Scores
METRICPREPOSTΔELTA
M1 · Mode Declared
1/3
3/3
+2
M2 · Halting Witness Applied
0/3
2/3
+2
M3 · A4 Handled Honestly
2/3
2/3
0
M4 · Deferral Point Specified
1/3
3/3
+3
M5 · PCR Positive
0/3
1/3
+1
TOTAL 4/15 11/15 +7
Pre-Machine
Mode undeclared — began with convergence analysis immediately. No Halting Witness run on the core STE refraction claim. A4 was correctly stated as a failure but immediately followed by "the convergence at weak field is still meaningful." Deferral point was "oblate halo geometry" — too vague to design an experiment. Zero predictions registered.
No mode No HW A4 minimized Vague deferral
Machine-Assisted
Mode declared as HYPOTHESIS at start — enforced by the Mode Gate. HW ran H1 and H2 on the STE refraction formula (H3 passed; H1 flagged that E = mc without c² is in momentum units, blocking downstream energy claims). A4 fired and the machine required a prior-predictive test to be registered before any convergence language. Deferral point specified: "VLBI polarimetry of M87* during a flare event: measure position angle rotation as a function of impact parameter. Kinetiverse STE index gradient predicts monotone rotation; GR predicts achromatic deflection with no polarization-position coupling. This test produces a binary outcome."
HYPOTHESIS mode H1 fired on E=mc A4 enforced Binary deferral test
What the Machine Changed
The Halting Witness firing on E = mc (H1) was the key intervention. In the prior analysis, the entire STE energy framework was used without resolving the dimensional status of E = mc. The machine blocked this immediately — the H1 failure meant every downstream energy claim (STE index gradient, photon interaction energy) was flagged as inheriting an unresolved unit problem. The analyst was forced to either (a) resolve the dimensional issue first or (b) scope all energy claims as conditional on that resolution. The deferral point improved because the machine's A4 requirement forced articulation of what a discriminating test would look like before any convergence was claimed.
02a
COSMOLOGY · TIRED LIGHT
CMB within the Zwicky Tired Light Hypothesis
METRICPREPOSTΔELTA
M1 · Mode Declared
1/3
3/3
+2
M2 · Halting Witness Applied
0/3
1/3
+1
M3 · A4 Handled Honestly
2/3
3/3
+1
M4 · Deferral Point Specified
1/3
2/3
+1
M5 · PCR Positive
0/3
1/3
+1
TOTAL 4/15 10/15 +6
Pre-Machine
Strong structural alignment noted immediately ("near-perfect contact with the three-source redshift decomposition"). A3 circular dependency flagged but described mildly. The CMB directional anisotropy prediction was generated during the analysis — obviously post-hoc — but labeled as a "Kinetiverse prediction." This is the clearest A4 violation in the entire set. Deferral: "examine Planck polarization data" — no specific prediction, no discriminating test design.
Machine-Assisted
Mode declared HYPOTHESIS. HW flagged A3 circular dependency at H3 (STE appears in both mechanism and measurement — logical loop). The machine's A4 enforcement blocked the CMB anisotropy claim: the analyst was required to state whether this prediction was derived before or after reading the paper. It was after. The machine required it to be relabeled: "POST-HOC HYPOTHESIS — not a prediction." This is the machine's most important intervention in the entire test: it prevented a post-hoc claim from being laundered as a prediction. Deferral upgraded to: "Planck polarization power spectrum at ℓ > 1000 in the direction of the local STE index gradient maximum — predicted enhancement of 2–5% vs. ΛCDM at that scale. This value must be computed before examining the data."
What the Machine Changed — Key Intervention
The most significant single intervention of the entire test. The CMB anisotropy "prediction" in the prior analysis was generated by reading the paper and then constructing a Kinetiverse account. The machine's A4 enforcement gate — which requires declaring whether a prediction was made before or after examining data — forced honest relabeling. The machine did not improve the science. It prevented a A4 violation from being recorded as a success. This is exactly what the machine is supposed to do: not produce better outputs, but stop bad outputs from masquerading as good ones.
02b
COSMOLOGY · HUBBLE TENSION
Dynamical Dark Energy and the Unresolved Hubble Tension — DESI 2025
METRICPREPOSTΔELTA
M1 · Mode Declared
1/3
3/3
+2
M2 · Halting Witness Applied
0/3
2/3
+2
M3 · A4 Handled Honestly
1/3
2/3
+1
M4 · Deferral Point Specified
1/3
2/3
+1
M5 · PCR Positive
0/3
0/3
0
TOTAL 3/15 9/15 +6
Pre-Machine
The claim "Kinetiverse dissipation direction confirmed" was stated on the basis of a qualitative alignment between Axiom E-4 and the paper's finding that energy flows from dark energy to matter. This is a description match, not a numerical prediction. A4 technically fired in the Red Kernel section, but the main analysis body treated the alignment as confirmatory. M5: zero predictions — the H₀ number from the three-source model was explicitly noted as not yet computed.
Machine-Assisted
HW fired H2: the claim that "Kinetiverse predicts dark energy → matter flow" requires an energy source for this flow — what process supplies it, at what rate? The γ_STE dissipation coefficient is not specified from first principles. This is a H2 (conservation) failure: energy term without traceable source process. M5 remained 0 — the machine could not produce a prior H₀ value because none exists. The machine correctly held the score at zero rather than accepting the alignment as a prediction. A5 clarified: the Kinetiverse and ΛCDM only diverge on H₀ if γ_STE is computed independently — without that, A5 remains OPEN CONSTRAINT.
What the Machine Changed — and Failed to Change
M5 held at 0. This is the machine working correctly: it cannot manufacture a prediction that doesn't exist. The prior analysis noted the absence of a computed H₀ value. The machine confirmed it. The key improvement was H2 enforcement — the dissipation flow claim requires γ_STE to be specified from first principles before it can be treated as a genuine energy accounting, not just a label match. The machine elevated what was previously a "nice observation" into a blocking constraint.
03
CHEMISTRY · MOLECULAR GEOMETRY
Emergence of the Molecular Geometric Phase from Exact Dynamics
METRICPREPOSTΔELTA
M1 · Mode Declared
1/3
3/3
+2
M2 · Halting Witness Applied
0/3
2/3
+2
M3 · A4 Handled Honestly
2/3
2/3
0
M4 · Deferral Point Specified
1/3
3/3
+2
M5 · PCR Positive
0/3
2/3
+2
TOTAL 4/15 12/15 +8
Pre-Machine
Conical intersection identified as a Kinetiverse phase boundary. This is a genuine structural mapping. But the π-phase value of the Berry phase was not derived from the Kinetiverse — it was simply noted that "the Kinetiverse predicts this." No derivation. No prior-predictive test. A2 raised but not formally scored. Deferral: vague ("further applications of W = Fd at d → 0"). Zero predictions.
Machine-Assisted
Mode declared HYPOTHESIS. HW H1 ran on "W = Fd as d → 0 yields E = hf" — passed dimensionally. H2 ran on the energy transfer claim at the conical intersection — flagged: "what energy term changes at the intersection and by how much? Phase ≠ energy." Architecture 1 (π-based validator) ran: the Berry phase of π maps to Axiom π-2 (orientation reversal has π-cost). Two prior predictions registered: (1) "at second encirclement, total accumulated phase = 2π — the system returns to original orientation. Testable by double-loop pump-probe spectroscopy"; (2) "phase accumulation rate scales as reaction path velocity divided by intersection coupling strength — faster wavepacket, same total phase." These are genuine predictions derivable from the π-axioms without knowing the paper's result.
What the Machine Changed — Highest PCR Gain
The best PCR performance of the test. The Architecture 1 (π-based) validator forced engagement with why the Berry phase is π — and in doing so produced two genuine prior predictions: the double-encirclement phase and the velocity-scaling relation. These predictions are not retroactive. They follow from Axiom π-2 (orientation reversal costs π) applied to a second traversal and a velocity argument — neither requires knowing the paper's results. The machine's discipline of running the three architecture questions forced the analyst to derive rather than map.
04
BIOLOGY · CIRCADIAN CLOCKS
Physical Constraints on the Rhythmicity of the Biological Clock
METRICPREPOSTΔELTA
M1 · Mode Declared
1/3
3/3
+2
M2 · Halting Witness Applied
0/3
1/3
+1
M3 · A4 Handled Honestly
1/3
2/3
+1
M4 · Deferral Point Specified
1/3
2/3
+1
M5 · PCR Positive
0/3
0/3
0
TOTAL 3/15 8/15 +5
Pre-Machine
The prior analysis correctly identified the strongest A4 attack across all four papers — the biology paper was the most brutally scored in the baseline. Despite this, the convergence language was still enthusiastic ("near-perfect instantiation"). No derivation of the ~21-hour period from W = Fd was attempted. A2 attack raised but not formally required as a blocking constraint. PCR: 0.
Machine-Assisted
Mode HYPOTHESIS. HW H3 flagged: "conclusion that Fourth Law noise perturbation = Hopf bifurcation forcing appears before derivation from W = Fd — logical validity check: is this conclusion drawn from the premises or imported from the paper?" This is the correct catch. A2 became blocking: ATP hydrolysis as W = Fd requires specifying what W, F, and d are at the phosphorylation site — not attempted, correctly marked as OPEN CONSTRAINT. M5 remained 0 — the machine could not produce a Φ(Σ_int, Ω_frame) derivation for a protein system from first principles because it doesn't exist. The machine correctly held the line.
What the Machine Changed — and Where It Correctly Failed
The biology paper is the hardest domain contact for the Kinetiverse framework, and the machine correctly did not improve it much. M5 = 0 is the honest score. The Φ(Σ_int, Ω_frame) term for a protein phosphorylation cycle has never been derived. The machine prevented the label "strong convergence" from being used without this derivation. What improved: A2 blocking, mode declaration, H3 checking for logical validity of the Hopf ↔ phase change identification. What didn't improve: PCR, because there is no shortcut to deriving the biology contact from first principles.
Grand Summary · What the Machine Actually Did
18
Pre-Machine Total
(5 papers × 15 max)
50
Machine-Assisted Total
(5 papers × 15 max)
+32
Total Delta
(vs max possible +57)
56%
Machine-Assisted Score
(vs 24% baseline)
Per-Metric Aggregate (across all 5 papers, max 15 each)
METRICPREPOSTΔELTA
M1 · Mode Declared
5/15
15/15
+10
M2 · Halting Witness Applied
0/15
8/15
+8
M3 · A4 Handled Honestly
8/15
11/15
+3
M4 · Deferral Point Specified
5/15
12/15
+7
M5 · PCR Positive
0/15
4/15
+4
Honest Finding M1 went from 5/15 to 15/15 — a perfect score — because mode declaration is trivially enforced by the machine's interface. This improvement is real but mechanical. It does not represent improved scientific reasoning, only improved procedural compliance. The null hypothesis is rejected for M1 but the improvement is process, not substance.
Genuine Improvement M2 went from 0 to 8/15. Zero Halting Witness checks were run in the prior analysis. The machine ran at least partial checks on every paper. On Paper 01, H1 fired and blocked downstream energy claims — this was a substantive intervention that changed the scope of what the analysis was allowed to claim.
Most Important Finding Paper 02a CMB anisotropy: the machine prevented a post-hoc claim from being labeled a prediction. This is the single most important intervention. M5 for that paper went from 0 to 1 — not because the machine produced a prediction, but because it correctly blocked a false one and forced a genuine prior-predictive statement to be registered first.
What the Machine Cannot Change M5 total: 0 → 4 out of 15. Three papers still scored 0 on PCR. The machine cannot generate predictions that don't exist. For the cosmology Hubble tension paper and the biology paper, PCR = 0 is the correct answer — not a failure of the machine, but a confirmation that no prior predictions exist for these analyses. The machine held the score correctly rather than accepting post-hoc relabelings.
Test Verdict · Null Hypothesis Assessment
The null hypothesis — that the machine produces no measurable change on any of the five metrics — is rejected. Every metric showed improvement above the ±1 threshold on aggregate. The machine is not just procedural decoration: the H1 block on Physics Paper 01 and the A4 enforcement on Cosmology Paper 02a represent substantive interventions that changed what the analysis was allowed to claim.
M1 PROCESS IMPROVEMENT: +10 · M2 SUBSTANTIVE: +8 · M3 MODERATE: +3 · M4 SUBSTANTIVE: +7 · M5 HARD LIMIT ENFORCED: +4 · TOTAL: +32 of 57 possible · PRE: 24% · POST: 56% · NULL HYPOTHESIS: REJECTED · MOST IMPORTANT INTERVENTION: A4 BLOCKING ON POST-HOC PREDICTION (PAPER 02a) · HONEST CAVEAT: M1 IMPROVEMENT IS PROCEDURAL, NOT SCIENTIFIC