Let’s put aside the fact that it necessarily means giving full access
to the content of your conversations to Microsoft (no end-to-end
encryption if Bing needs to listen in to offer its advice!).
I just want to quickly review the examples that Microsoft has decided
to highlight in its own blog posts to showcase the capabilities of its
new superpowered AI. Because… I’m starting to doubt that anyone even
reads the chatbot outputs before sharing them and saying how
wonderful they are.
Let’s start by the first example, in the Bing AI announcement.
I am planning a trip for our anniversary in September. What are some
places we can go that are within a 3 hour flight from London
Relatively simple and straightforward request. We need destinations
Less than 3 hours flight from London Heatrow.
Suitable / popular for a romantic getaway.
If we ask Google for a “romantic trip less than 3 hours flight from
London”, we get several articles with a lot of possible choices, from inspiringtravel.co.uk,
Not all of those are suitable for the request, but it’s fairly easy to
sift through (almost all include some estimation of the flight time and
what to do there).
What’s Bing AI’s take?
First one is Malaga. The provided information seems correct, and it
fits the request, depending on what you consider suitable for a
anniversary trip. So far, so good.
Second one is… Annecy, in France. Certainly a very nice city, but it
doesn’t have an airport, and is therefore absolutely not within a 3 hour
flight from London Heatrow. Ok. One mistake, let’s move on.
Third one is… Florence, Italy. Which as far as I can tell requires a
minimum flight time of four hours.
That’s the first example they chose to highlight: two of the three
proposed destinations do not fit the prompt. In the propositions from
the Google search, you can find Paris, Amsterdam, and plenty of other
destinations which arguably fit the prompt better. Also, you know,
instead of the Venice of France, there is actually a 2h20
minutes flight from London to actual Venice…
Let’s see what Bing
AI in Skype can do. Here, they provide three conversation examples.
In the first one, they ask for some vegetarian recipes, and it delivers.
I don’t really see how it’s easier than Google in this case, but fine.
Second one is about cleaning up a full mailbox, and here again the
results seem fine, but you could put the same thing in Google (or
probably even Bing!) and get it in the first result.
The third one is where we finally see a request that requires some
level of “intelligence”: “what should we do during a layover in Spain?
Food an beaches ideally”. That’s a relatively easy question:
Bing AI can choose from all over Spain, and information about beaches
and restaurants are typically not difficult to find. Can you guess how
well it performs?
The first suggestion is fine… Almost. At first glance. As
far as I can tell (because the reference is hidden in the screenshot)
La Mallorquina in Barcelona is not a pastry shop, it’s a
textile shop. There is a pastry shop called “La Mallorquina Formentor”,
but it doesn’t seem particularly famous. There is a famous pastry shop
called “La Mallorquina” in Spain… but it’s in Madrid. I’m sure Spanish –
and Catalan – users will be delighted to know Bing AI confuses Madrid
The second suggestion is incredible. “If you are in Port of Spain”.
Port of Spain. Which is in Trinidad and Tobago, around
6000km from Spain, is the second option for a layover in Spain from Bing
AI. In the example cherry-picked by Microsoft to showcase how
good their system is. Seriously.
This is absurd. Microsoft may be getting some people to Bing with the
hype, but I really don’t see how they will retain customers if they have
such a high miss rate in their answers.
The fact that a comparatively much smaller mistake made by Bard in
Google’s demo of their competing system apparently had a huge
impact on their stock while Microsoft’s mistakes (and there are
their other demos) don’t seem to have much impact on the hype either
suggests that tech investors are completely irrational, or that everyone
is so used to Bing being bad that “bad, but with style” is really seen
as an improvement.
Anyway. I can’t wait for this hype cycle to be over so that we can
move beyond large generative models, which seem more and more like a
scientific dead end.