[Opinion] ChatGPT vs Wolfram Alpha

Adrien Foucart, PhD in biomedical engineering.

Get sparse and irregular email updates by subscribing to https://adfoucart.substack.com. This website is guaranteed 100% human-written, ad-free and tracker-free.

This post was originally published on LinkedIn.

A big misconception I see with ChatGPT relates to the way it “knows” data. I’ve seen many statements like “ChatGPT has a knowledge base that ends in 2021”, or that it “cannot access Google to check its answers”, presented as limitations that may be addressed in future versions. Surely, if it “knows” stuff until 2021, it cannot be that hard to update its knowledge base with more up to date information, or even real-time information with a Google-like web scraper in the background, right?

The problem is that ChatGPT doesn’t “know” data until 2021, not in the way that is implied by the idea of “updating” it. If you ask about the birth date of Napoleon, ChatGPT doesn’t have some biographical data that it can query to retrieve the correct information. What it may have is a strong association between the words “Napoleon”, “birth date” and “August 15, 1769”, if they were often close together in its training data, which is very likely. It will also “know” from its training data that a question about a birth date is very likely to require an answer in a date-like format. Putting all of that together, it will probably give the right answer most of the time.

But this is not knowledge from a database: the information is embedded in the parameters of the network. That means that “updating” the information, or adding new information, requires retraining the model. Which is, a), very expensive to do, and b), may interfere with some other previously held “information”.

So a ChatGPT-like model will never be able to function as an up-to-date source of information, because you can’t just retrain it regularly with new information scraped from news sources, or Wikipedia, or wherever. Likewise, it cannot just be plugged into a search engine to get the information on-the-fly.

A closer thing to this use case, an actual knowledge base that can be queried using natural language, actually already exists (with its own important limitations): it’s Wolfram Alpha.

If you ask Wolfram Alpha “what is the birthdate of Napoleon”, it will parse the query and, crucially, start by giving you information about how it interpreted it, and the assumptions it made. For instance: that you are referring to “Napoleon” the royal person and not the fictional character from Animal Farm. Then it will give you the result, and then dump some additional related biographical information. This is super important, because it’s a lot easier to determine if the information can be trusted or not.

Is Wolfram Alpha a perfect source of information? Certainly not. It’s not as good at parsing natural language, its answers are not formatted in a nice prose, and it doesn’t attempt to do things like write computer code or poetry. But at least, if it gives you an answer… it’s probably correct. Which in my humble opinion is fairly important if you want to use something as a knowledge source…