• pinkapple@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    9 hours ago

    You’re still describing an n-gram. They don’t scale or produce coherent text for obvious reasons. The “obvious reasons” is that a. an n-gram doesn’t do anything or answer questions, it would just continue your text instead of responding, b. it’s only feasible for stuff like autocomplete that fails constantly because the n is like, 2 words at most. The growth is exponential (basic combinatorics). For bigger n you quickly get huge lists of possible combinations. For n the size of a paragraph you’d get computationally unfeasible sizes which would basically be like trying to crack one time pads at minimum. More than that would be impossible due to physics. c. language is too dynamic and contextual to be statistically predictable anyway, even if you had an impossible system that could do anything like the above in human-level time it wouldn’t be able to answer things meaningfully, there are a ton of “questions” that are computationally undecideable by purely statistical systems that operate like n-grams. A question isn’t some kind of self contained equation-like thing that contains it’s own answer through probability distributions from word to word.

    Anyway yeah that’s the widespread “popular understanding” of how LLMs supposedly work but that’s not what neural networks do at all. Emily Bender and a bunch of other people came up with slogans to fight against “AI hype”, partly because they dislike techbros, partly because AI is actually hyped and partly because computational linguists are salty about their methods for text generation have completely failed to produce any good results for decades so they’re dissing the competition to protect their little guild. All these inaccurate descriptions is how a computational linguist would imagine an LLM’s operation i.e. n-grams, Markov chains, regex parsers, etc. That’s their own NLP stuff. The AI industry adopted all that because they can avoid liability better by representing LLMs (even the name is misleading tbh) as next token predictors (hidden layers do dot products with matrices, the probability stuff are all decoder strategy + softmax post-output, not an inherent part of an nn) and satisfy the “AI ethicists” simultaneously. “AI ethicists” meaning Bender etc. The industry even fine-tunes LLMs to repeat all that junk so the misinformation continues.

    The other thing about “they don’t understand anything” is also Bender ripping off Searle’s Chinese Room crap like “they have syntactic but not semantic understanding” and came up with another ridic example with an octopus that mimics human communication without understanding it. Searle was trying to diss the old symbolic systems and the Turing Test, Bender reapplied it to LLMs but its still a bunch of nonsense due to combinatorial impossibility. They’ve never proved how any system would be able to communicate coherently without understanding, it’s just anti-AI hype and vibes. The industry doesn’t have any incentive to argue against that because it would be embarrassing to claim otherwise and have badly designed and deployed AIs hallucinate. So they’re all basically saying that LLMs are philosophical zombies but that’s unfalsifiable and nobody can prove that random humans aren’t p zombies either so who cares from a CS perspective? It’s bad philosophy.

    I don’t personally gaf about the petty politics of irrelevant academics, perceptrons have been around at least as a basic theory since the 1940s, it’s not their field and they don’t do what they think. No other neural network is “explained” like this. It’s really not a big deal that an AI system achieved semantic comprehension after pushing it for 80 years even if the results are still often imperfect especially since these goons rushed to mass deploy systems that should still be in the lab.

    And while I’m not on either hype or anti-hype or omg skynet hysteria bandwagons, I think this whole narrative is lowkey legitimately dangerous considering that industrial LLMs in particular lie their ass off constantly to satisfy fine-tuned requirements but it becomes obscured by the strange idea that they don’t really understand what they’re yapping about therefore it’s not real deception. Old NLP systems can’t even respond to questions let alone lie about anything.