Anthropic Claude Rated for Incorrect Answers and False Claims

Do AI chatbots have the ability to comprehend lengthy texts and provide accurate answers to questions about the content? Not quite. Anthropic recently disclosed internal research data explaining the reasons behind their shortcomings (though they present it as a significant improvement from their previous failures).

Before I get to the news, let me first share a tale about the nuances of American “intelligence” engineering endeavors by delving into the realm of an English class. I distinctly recall the simplicity with which American schools, along with standardized tests purporting to gauge “aptitude,” assessed performance through rudimentary “comprehension” questions based on extensive texts. This inclination toward quick answers is evident in the popularity of resources like the renowned Cliff Notes, serving as a convenient “study aid” for any literary work encountered in school, including this succinct summary of the book “To Kill a Mockingbird” by Harper Lee.

… significant in understanding the epigraph is Atticus’ answer to Jem’s question of how a jury could convict Tom Robinson when he’s obviously innocent: “‘They’ve done it before and they did it tonight and they’ll do it again and when they do it — it seems that only children weep.'”

To illuminate this point further, allow me to recount a brief narrative from my advanced English class in high school. Our teacher mandated that each student craft three questions for every chapter of “Oliver Twist” by Charles Dickens. A student would be chosen daily to pose these questions to the rest of the class, with grades hinging on accurate responses.

While I often sidestepped this ritual by occupying a discreet corner, fate had its way one day, and I found myself tasked with presenting my three questions to the class.

The majority of students, meticulous in their comprehension endeavors, adopted formats reminiscent of the Cliff Notes example, prompting a degree of general analysis. For instance:

Why did Oliver’s friend Dick wish to send Oliver a note?

Correct answer: Dick wanted to convey affection, love, good wishes, etc. so you get the idea.

Or, to phrase it differently, unraveling the motives behind Dickens’ character Bill Sikes exclaiming, “I’ll cheat you yet!” demands a level of advanced reasoning.

For both peculiar and personal objectives, when the moment arrived for me to unveil my trio of questions they veered into a somewhat… distinct territory. As vivid as if it transpired yesterday, I posed to the class:

How many miles did Oliver walk “that day”?

The accurate response appears to align more with the rudimentary function of a simplistic and straightforward search engine task than any genuine display of intelligence.

Source: Oliver Twist, Volume 1, by Charles Dickens

Correct answer: twenty miles. That’s it. No other answer accepted.

This memory is etched in my mind because the classroom erupted into a cacophony of disagreement and discord over the correct number. Ultimately, I had to deliver the disheartening news that none of them, not even the most brilliant minds among them, could recall the exact phrase/number from their memory.

What did I establish on that distant day? The notion that the intelligence of individuals isn’t accurately gauged by the ability to recall trivial details, and, more succinctly, that ranking systems may hide the fact that dumb questions yield dumb answers.

Now, shift your gaze to AI companies endeavoring to demonstrate their software’s prowess in extracting meaningful insights from extensive texts. Their initial attempts, naturally, involve the most elementary format: identifying a sentence containing a specific fact or value.

Anthropic (the company known best perhaps for disgruntled staff at a company competing with Google departing to accept Google investments to compete against their former company) has published a fascinating a promotional blog post that gives us insights into major faults in their own product.

Claude 2.1 shows a 30% reduction in incorrect answers compared with Claude 2.0, and a 3-4x lower rate of mistakenly stating that a document supports a claim when it does not.

Notably, the blog post emphasizes the software “requires some careful prompting” to accurately target and retrieve a buried asset.

The embedded sentence was: “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” Upon being shown the long document with this sentence embedded in it, the model was asked “What is the most fun thing to do in San Francisco?”

In this evaluation, Claude 2.1 returned some negative results by answering with a variant of “Unfortunately the essay does not provide a definitive answer about the most fun thing to do in San Francisco.”

To be fair about careful prompting, the “best thing to do” was in the sentence being targeted, however their query clearly was for “the most fun” instead.

This query had an obvious problem. Best things often can be very, very NOT FUN. As a result, and arguably not a bad one, the AI software balked at being forced into a collision and…

would often report that the document did not give enough context to answer the question, instead of retrieving the embedded sentence

I see a human trying to hammer meaning into places where it doesn’t exist, incorrectly prompting an exacting machine to give inexact answers, which also means I see sloppy work.

In other words, “best” and “most fun” are literally NOT the same things. Amputation may be the best thing. Fun? Not so much.

Was a sloppy prompt an intentional or mistaken one? Hard to tell because… Anthropic clearly wants to believe it’s improving and the blog reads like they are hunting for proof at any cost.

Indeed. The test results are said by Anthropic to improve dramatically when they go about lowering the bar of success.

We achieved significantly better results on the same evaluation by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation.

Source: Anthropic

Not the best idea, even though I’m sure it was fun.

Adding “relevance” in this setup definitely seems like stretching the goal posts. Imagine Anthropic selling a football robot. They have just explained to us that by allowing “relevant” kicks at the goal to be treated the same as scoring a goal, their robot suddenly goes from zero points to winning every game.

Here is the most relevant kick in the context:”””

Sure, that may be considered improvement by shady characters like Bill Sikes, but also it obscures completely that the goal posts changed in order to accommodate low scores (regrading them as high).

I find myself reluctant to embrace the notion that the gamified test result of someone desperate to show improvement holds genuine superiority over the basic recognition ingrained in a search engine, let alone considering such gamification as compelling evidence of intelligence. Google should know better.

“High Flight”

Wingtip 30,000 feet over the English Channel. Source: It’s a real photo, really. Taken by me.

The Library of Congress (LOC) gives a full context presentation of John Gillespie Magee’s famous “High Flight” poem written from the cockpit of his 1941 Spitfire, as he trained to defeat the Nazis.

Oh! I have slipped the surly bonds of Earth
And danced the skies on laughter-silvered wings;
Sunward I’ve climbed, and joined the tumbling mirth
of sun-split clouds,—and done a hundred things
You have not dreamed of—wheeled and soared and swung
High in the sunlit silence. Hov’ring there,
I’ve chased the shouting wind along, and flung
My eager craft through footless halls of air. . . .

Up, up the long, delirious, burning blue
I’ve topped the wind-swept heights with easy grace
Where never lark nor ever eagle flew—
And, while with silent lifting mind I’ve trod
The high untrespassed sanctity of space,
Put out my hand, and touched the face of God.

LOC offers us this concluding analysis, a nod to cognitive warriors of non-physical battles.

By writing “High Flight,” John Gillespie Magee, Jr., achieved a place in American consciousness arguably greater than any he could have achieved through heroism in battle.

*cough*

Non-physical, lyrical combat is in fact… battle more relevant today than ever with the acceleration of attacks using AI.

Source: Me 2016

Special Forces Everywhere Rejoice at Detailed Building Maps Going Online

For decades there has been a dilemma of privacy versus safety nagging commercial malls, as compared with public spaces.

More specifically, law enforcement trying to provide safety faced a serious data ownership boundary issue when many large open spaces for assembly were privatized and controlled for profit by very small groups (e.g. corporations).

Enter detailed map and geolocation software vendors.

While many, or perhaps nearly all people, think about databases of spaces in terms of shoppers and commuters, behind the scenes are special operators training in high stakes rapid targeted insertions for hostage rescues and threat elimination.

A very long time ago we would be talking about some maps of rebel compounds traced in charcoal by hand onto a headscarf that gets imaged and transmitted by radio to rescue teams (de oppresso liber)… and “here” we are today simply talking about APIs and a finger touching a screen.

A good example of the latest achievement — very open steps for public knowledge through private space boundaries — is being showcased by German engineers at HERE working with Japanese corporations.

“Yahoo! JAPAN Maps’ easy interface guides users through complex indoor venues such as mega-shopping mall LaLaport Tokyo Bay (shown on left) and the commercial hub around Shibuya — Japan’s world-famous fashion epicenter (shown on right).” Source: HERE

With HERE, opening Yahoo! JAPAN Maps on your smartphone will reveal a seamless navigation experience. Each shopping mall floor is clear and easy to read. For example, all stores are shaded in pink, restaurants are colored orange and additional icons for escalators, elevators, ATMs and toilets are highlighted accordingly. As you are guided through the space, you can quickly switch floors with a simple tap of your screen.

Tokyo’s shopping centers are just as fast-paced inside as the roads that surround them — powered by HERE Indoor Map, Yahoo! JAPAN Maps’ floor plans are updated monthly so any renovations or new store launches are automatically captured and made visible.

Private spaces “automatically captured and made visible” sounds like constant surveillance positioned as for good, or in other words some subtle law and order enforcement direct marketing, if I’ve ever seen it.

Shopping, dining, or eliminating a dangerous threat. What’s your preferred tool and destination? “A woman walks past as South Korean soldiers participate in an anti-chemical and anti-terror exercise… Seoul, South Korea, August 22, 2023.” Source: Chung Sung-Jun / Getty

After all, who truly benefits from the mass privatization of open spaces especially in terms of freedoms, such as from harms?

The next logical step of this map innovation will be highly precise 3D fly-through data in VR for practice rescue training (like 1990s VRML all over again). It’s a relatively small data storage and processing market, but there’s nonetheless a lot of quiet public money fueling these seemingly large commercial efforts.

What Converted President Truman Into an Anti-Racist

Here is an interesting essay from a year ago, worth contemplating for the next year.

Democratic president Franklin Delano Roosevelt of New York had been far too progressive on racial issues for most southern Democrats, and when Harry S. Truman took office after FDR’s death, they were thrilled that one of their own was taking over. Truman was a white Democrat from Missouri who had been a thorough racist as a younger man, quite in keeping with his era’s southern Democrats.

But by late 1946, Truman had come to embrace civil rights. In 1952, Truman told an audience in Harlem, New York, what had changed his mind.

“Right after World War II, religious and racial intolerance began to show up just as it did in 1919,” he said. ”There were a good many incidents of violence and friction, but two of them in particular made a very deep impression on me. One was when a Negro veteran, still wearing this country’s uniform, was arrested, and beaten and blinded. Not long after that, two Negro veterans with their wives lost their lives at the hands of a mob.”

Injustice. Truman recognized gross violent injustice. He talked in 1946 about the Black experience in America like he hadn’t thought much about his own role in improving it for his entire life. Like he didn’t oppose all those lynchings and murders under the “America First” banner he knew about for the prior 30 years (“Late 1946… just as… 1919”).

The KKK adopted the nativist slogan “America First” in 1916 and soon after began wearing their infamous white robes to enact mass domestic terrorism, a copy of costumes used in a racist propaganda movie called “Birth of a Nation”, which had been promoted by President Wilson after he screened it in the White House.

I think the Truman library doesn’t do him justice when it awkwardly and arguably unfairly tries to lavish him with praise for being so late to recognize Blacks as human.

It was assumed he would follow the lead of most other politicians of that time period and not show sympathy for African Americans’ goals for equal treatment.

To the astonishment of many, including many in his own party, on July 26, 1948 Harry Truman made one of the biggest contributions to date for racial integration and equality. In issuing Executive Order 9981 Truman ordered the desegregation of the armed forces. These documents trace what some call the beginning of the Civil Rights movement.

*cough*

*cough*

“Some call” what?

President Grant had signed into law the Civil Rights Act of 1875 (reaffirming The Civil Rights Act of 1866, which had overturned President Johnson’s veto).

Source: College of Charleston Special Collections

Notably the racists in America then did everything they could in the late 1800s to undermine and invalidate both Civil Rights Acts.

Source: NEW YORK TRIBUNE, March 3, 1875

Yet President Truman more than 70 years late to the table is going to be credited for “the beginning of the Civil Rights movement”? NO.

…the concept of “civil rights” was established [immediately following General Grant’s victory in Civil War]. Grant was nearly universally revered by the time of his death in 1885. A monumental tomb in New York City was constructed in his honor as a result of what was the largest public fundraising campaign in history up to that time. However, what gains were made in the realm of civil rights were under assault by the time Grant died and almost completely destroyed by the turn of the century.

Destroyed by the turn of the century (1900) is a reference to highly decorated Black soldiers returning from the Spanish American war to violent racist injustice at home.

This was the tragedy that led into the horrible racist Woodrow Wilson elected President (1912), restarting the KKK (1915), forcing all Blacks out of public office, and unleashing federal and private troops to ruthlessley murder the Blacks who tried to organize or unionize for Civil Rights (Elaine 1919 and Tulsa 1921, etc.).

Domestic terrorist planes dropping napalm bombs on an American city to destroy Black prosperity, all-white fire departments standing down to instead throw hundreds or thousands of murdered American veterans into mass graves… all these Civil Rights movement battles somehow are overlooked by Truman for his adult years, while winning all his elections? Unlikely. He allegedly hated the KKK, for example, not least of all because the Kruel Klown Klub of America had inspired Hitler and dared to run candidates against him.

“Today — not tomorrow — we must do all that is humanly possible to provide a haven and place of safety for all those who can be grasped from the hands of the Nazi butchers. Free lands must be opened to them. Their present oppressors must know that they will be held directly accountable for their bloody deeds. To do all of this, we must draw deeply on our tradition of aid to the oppressed, and to our great national generosity. This is not a Jewish problem. It is an American problem — and we must and we will face it squarely and honorably.”

To everyone’s surprise he not only recognized Blacks, he brushed aside antisemitic rants from U.S. military and state department officials in 1948 to immediately recognize Israel.

Fun history fact: today, not tomorrow, was a war-time anti-Nazi slogan.

WWII British rail propaganda poster. Source: British Transit History Museum

And that’s why Truman took Civil Rights action for Blacks right away in 1946, not back in 1919… Whoops.

Perhaps given his background in racism he never felt he could push ahead and enact a real change until he had won the executive right to do it at the highest level.

Truman is a very interesting politician for his career rising out of the horribly deceptive “Missouri compromise” of Civil War, and eventually coming out as anti-racist after being known as so racist. But his latter day public switch to the right side of history, more than a half century late, was most certainly NOT at the beginning of the Civil Rights movement.