deepseek-reasoner

20250120_deepseek_deepseek_reasoner_None_n_20_fmt_react_verified_mini

Setup

Model deepseek-reasoner
Temperature N/A
Max Iterations 20
Format react
Max Cost $1.00

% Resolved

50.0%
Resolved 25
Total 50

Cost

$3.82
$/Instance 7.64¢
Resolved/$ 6.54

Token Usage

7.0M
Prompt
5.9M (4.5M)
Completion 1.1M

Status

Final status of instances evaluated with the Moatless EvalTools SWE-Bench Harness. Indicates if the instance was resolved successfully, failed to complete, encountered an error, or didn't generate any patches.

Flags

Flags indicate potential issues in how the LLM follows the agentic workflow. They help identify common failure modes like hallucinations, "stuck in a loop", or missing test verifications.

Instances

50 instances
resolved29%102.85¢
failed_actions
73.2K
failed43%148.12¢
failed_actions
132.9K
resolved27%77.37¢
string_not_foundfailed_actions
73.5K
resolved79%73.21¢
failed_actions
56.6K
no patch0%33.78¢
no_test_patch
33.2K
failed53%138.97¢
failed_testsfailed_actions
166.7K
failed19%117.12¢
string_not_foundduplicated_actionsno_test_patchfailed_actions
112.0K
resolved56%104.59¢
99.6K
failed31%20$0.13
string_not_foundduplicated_actionsno_test_patchfailed_actions
179.5K
resolved65%71.78¢
failed_actions
49.2K
resolved83%135.39¢
failed_actions
139.7K
resolved88%72.93¢
50.5K
failed56%167.14¢
string_not_foundduplicated_actionsfailed_actions
189.5K
resolved53%17$0.12
string_not_foundfailed_actions
193.5K
resolved25%156.55¢
string_not_foundfailed_actions
169.6K
resolved4%20$0.11
string_not_foundfailed_actionsretries
239.2K
resolved71%20$0.14
failed_testsduplicated_actionsfailed_actions
212.2K
failed32%20$0.11
string_not_foundduplicated_actionsno_test_patchfailed_actions
179.4K
failed10%135.38¢
failed_testsduplicated_actionsfailed_actions
115.9K
failed8%54.64¢
no_test_patch
51.3K
failed0%15$0.12
string_not_foundfailed_actions
171.9K
failed40%20$0.16
duplicated_actionsno_test_patchretries
231.2K
resolved59%114.43¢
failed_actions
105.5K
failed13%93.61¢
failed_actions
77.5K
resolved85%84.06¢
75.6K

Instances per page

Page 1 of 2