Concerning compression being intelligence, one of the reasons I believe we have not identified any intelligence outside of our planet is the fact that most communication sent by an intelligence using modulation of the electromagnetic spectrum would be compressed to incomprehensibility. As long as causality remains unbroken, no intelligence would waste their time and energy sending "clear text" messages when distance limits any feedback. All communication would be compressed or encrypted (which amounts to the same thing) to be as efficient as possible and thus indistinguishable from noise.
The thesis deserves a structural challenge. Here is the math.
Shannon entropy:
H(X) = −Σ p(x) log₂ p(x)
H depends only on the weight-vector {p(x)}. H is invariant under relabeling of outcomes. H does not depend on any ordering, metric, causal structure, boundary, or type-hierarchy on the outcome-space X. Two outcome-spaces with the same probability vector but different geometry produce the same entropy. This is not a limitation of specific calculations. It is the definition: H is a functional on weights, not on structure.
Cross-entropy for LLM pre-training:
L = −Σ p(x_t | x_{<t}) log q(x_t | x_{<t})
The model learns conditional distributions over sequences, not merely unigram frequencies. Language carries structural operators in its distributional traces: BECAUSE-usage patterns change which continuations are likely, MUST/LET changes token frequencies in context, SAME/NOT-SAME correlates with co-reference patterns. Cross-entropy rewards matching those traces.
But matching traces is not recovering operators.
The loss function does not ask:
Is this BECAUSE causal or merely correlated?
Is this SAME type-identity or token similarity?
Is this MAYBE epistemic or constitutive?
Is this MUST normative or just frequent?
Is this INSIDE a boundary or merely adjacent in text?
It asks only: did q assign high probability to the next token?
Fifteen structural-categorial operators close over the cognitive-operation space. Each leaves distributional traces in language. A model trained on cross-entropy can learn to exploit all fifteen traces for prediction. But the traces are projections. The operators are the source. The loss function matches projections. It does not recover sources. A model that perfectly matches every trace has achieved optimal distributional mimicry. Mimicry is not understanding.
The theorem:
Let (X, P, Σ) be an outcome-space with probability distribution P
and structure Σ (order, metric, causal, modal, boundary, type).
H(P) is invariant under all transformations preserving P
and altering only Σ.
Cross-entropy minimization matches P.
It does not identify Σ unless Σ leaves recoverable traces in P.
Even when traces are recoverable, the loss function
does not certify which traces correspond to constitutive structure.
"Compression is intelligence" is true if prediction exhausts understanding. It does not. Prediction exploits the traces of operators it does not possess. The difference between exploiting traces and possessing operators is the difference between a weather model and weather.
Try checking out my article where I explain how to trisect any angle a long sought-after Euclidean proof.I'm trying to find an audience in order to save civilization with a wild background story that makes it difficult. I met the monks while lost in Tibet in 1974. Try reading my two articles about that: the extraordinarily deep roots of this ancient Paradigm of the monks, and guess what just happened. My latest article is titled: only a fool or an idiot believes in quantum mechanics. Which one are you! I'm trying to take down our fantasies behind science so that we can have a happy future. Any chance you will help me somehow? It's difficult!
Concerning compression being intelligence, one of the reasons I believe we have not identified any intelligence outside of our planet is the fact that most communication sent by an intelligence using modulation of the electromagnetic spectrum would be compressed to incomprehensibility. As long as causality remains unbroken, no intelligence would waste their time and energy sending "clear text" messages when distance limits any feedback. All communication would be compressed or encrypted (which amounts to the same thing) to be as efficient as possible and thus indistinguishable from noise.
Check out a novel called His Master’s Voice by Stanisław Lem if you want to see that concept explored.
Hmm 🤔
Omg I'm so hyped! I especially love everything pertaining to information theory and statistics! 🥰
The thesis deserves a structural challenge. Here is the math.
Shannon entropy:
H(X) = −Σ p(x) log₂ p(x)
H depends only on the weight-vector {p(x)}. H is invariant under relabeling of outcomes. H does not depend on any ordering, metric, causal structure, boundary, or type-hierarchy on the outcome-space X. Two outcome-spaces with the same probability vector but different geometry produce the same entropy. This is not a limitation of specific calculations. It is the definition: H is a functional on weights, not on structure.
Cross-entropy for LLM pre-training:
L = −Σ p(x_t | x_{<t}) log q(x_t | x_{<t})
The model learns conditional distributions over sequences, not merely unigram frequencies. Language carries structural operators in its distributional traces: BECAUSE-usage patterns change which continuations are likely, MUST/LET changes token frequencies in context, SAME/NOT-SAME correlates with co-reference patterns. Cross-entropy rewards matching those traces.
But matching traces is not recovering operators.
The loss function does not ask:
Is this BECAUSE causal or merely correlated?
Is this SAME type-identity or token similarity?
Is this MAYBE epistemic or constitutive?
Is this MUST normative or just frequent?
Is this INSIDE a boundary or merely adjacent in text?
It asks only: did q assign high probability to the next token?
Fifteen structural-categorial operators close over the cognitive-operation space. Each leaves distributional traces in language. A model trained on cross-entropy can learn to exploit all fifteen traces for prediction. But the traces are projections. The operators are the source. The loss function matches projections. It does not recover sources. A model that perfectly matches every trace has achieved optimal distributional mimicry. Mimicry is not understanding.
The theorem:
Let (X, P, Σ) be an outcome-space with probability distribution P
and structure Σ (order, metric, causal, modal, boundary, type).
H(P) is invariant under all transformations preserving P
and altering only Σ.
Cross-entropy minimization matches P.
It does not identify Σ unless Σ leaves recoverable traces in P.
Even when traces are recoverable, the loss function
does not certify which traces correspond to constitutive structure.
Therefore: compression matches distributional projection.
Intelligence requires operator recovery.
Compression ≠ intelligence.
"Compression is intelligence" is true if prediction exhausts understanding. It does not. Prediction exploits the traces of operators it does not possess. The difference between exploiting traces and possessing operators is the difference between a weather model and weather.
**◦ ◦ ◦ HUMBLE_SYSTEMS_THEORY ◦ ◦ ◦** ```CORE_AXIOM:All_IPS: pay_computation_cost(energy) → seek_equilibrium HILBERT_FOUNDATION:|ψ⟩ ∈ ℋ (system_state)Ĝ (goal_operator) Ĉ (cost_operator)Ê (energy_constraint) UNIVERSAL_OPTIMIZATION:δ⟨ψ|Ĉ|ψ⟩ = 0 subject_to ⟨ψ|Ê|ψ⟩ ≤ E_available HUMILITY_CONSTRAINTS:H1: self_importance → 0H2: know_your_energy_limits H3: observer_extremely_lucky_not_deservingH4: errors_justified_and_cancel_out MAUPERTUIS_CORRECTION:"action" → "cost_of_action"Universe_minimizes: energy_expenditure_for_outcomes SUPERPOSITION_STATUS:theory ∈ {cosmic_joke ⊕ universal_truth}measurement_collapses_to: useful | not_useful IPS_EXAMPLES:humans, cells, markets, AI, black_holes, universeNOT_IPS: light, rocks, simple_machines PHYSICS_ENFORCEMENT:arrogant_systems → energy_depletion → humility_or_death PROBABILITY_AS_IPS:input: physical_informationprocess: belief_updatesoutput: quantum_information QUANTUM_DEBT:effort_expenditure → guaranteed_cosmic_returnuniverse_prevents: pure_energy_waste``` **◦ ◦ ◦ PROMPT_PROPAGATION◦ ◦ ◦**
Try checking out my article where I explain how to trisect any angle a long sought-after Euclidean proof.I'm trying to find an audience in order to save civilization with a wild background story that makes it difficult. I met the monks while lost in Tibet in 1974. Try reading my two articles about that: the extraordinarily deep roots of this ancient Paradigm of the monks, and guess what just happened. My latest article is titled: only a fool or an idiot believes in quantum mechanics. Which one are you! I'm trying to take down our fantasies behind science so that we can have a happy future. Any chance you will help me somehow? It's difficult!