ChatGPT is a poorly compressed version of the Internet
A lot (maybe even too much) is being written about ChatGPT. Noteworthy is an article published on February 9, 2023 (The New Yorker) with the title “ChatGPT Is a Blurry JPEG of the Web” [1].
The author Ted Chiang describes the analogy to the Xerox bug in JBIG2, concerning lossy image compression, discovered in 2013. Because of the bug, certain copies of the XEROX copier were not blurry, but readable, since similar image elements were falsely classified as identical and therefore stored as a single object in order to save storage space. The copy gave the impression of being accurate, but was degraded. A blurred but truthful image would be preferable to a sharp image that feigns correctness by being readable. Therefore, the title of the above-mentioned article should better be “ChatGPT Isn't a Blurry JPEG of the Web”.
Chiang views the data model underlying ChatGPT as a highly compressed version of the web stored on OpenAI's servers. This is only possible with powerful compression. However, this is lossy; the original content cannot be reconstructed from the compression. ChatGPT is a user interface that accepts questions and generates answers from the stored data. Chiang's analogy is interesting insofar as interpolations are applied to lossy image compressions in order to make the image sharper. This is exactly what is done with the underlying transformers. These are large neural networks, which were first described in 2017 by a Google team and other researchers [2]. This analogy is also reminiscent of word2vec by Tomas Mikolov, where a neural network is used to guess a missing word in a passage of text [3].
Links
[1] ChatGPT Is a Blurry JPEG of the Web Ted Chiang, The New Yorker, February 9, 2023. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010. https://arxiv.org/abs/1706.03762
[3] Mikolov, T. et al (2013). Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/abs/1301.3781