John(Yueh-Han) Chen(@jcyhc_ai) 's Twitter Profileg
John(Yueh-Han) Chen

@jcyhc_ai

AI Research @berkeley_ai

ID:1698607819149418496

linkhttps://www.linkedin.com/in/yueh-han-chen/ calendar_today04-09-2023 08:04:10

15 Tweets

99 Followers

231 Following

John Yang(@jyangballin) 's Twitter Profile Photo

SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source!

We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code
github.com/princeton-nlp/…

SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code github.com/princeton-nlp/…
account_circle
Carlos E. Perez(@IntuitMachine) 's Twitter Profile Photo

1/n Are AI Forecasters the Next Big Thing? How A.I. Learned to Make Freaky Accurate Forecasts

New research unveils an ingenious system that unlocks the untapped potential of artificial intelligence to predict the future. Drawing on the burgeoning prowess of language models to…

1/n Are AI Forecasters the Next Big Thing? How A.I. Learned to Make Freaky Accurate Forecasts New research unveils an ingenious system that unlocks the untapped potential of artificial intelligence to predict the future. Drawing on the burgeoning prowess of language models to…
account_circle
Owain Evans(@OwainEvans_UK) 's Twitter Profile Photo

Great paper, showing impressive performance from LLMs for forecasting using an elegant retrieval, reasoning + finetuning approach. I expect further improvements in forecasting are possible with current LLMs with more iteration.

account_circle
Fred Zhang(@FredZhang0) 's Twitter Profile Photo

Beating prediction markets with chatbots sounds cool. In a recent work arxiv.org/abs/2402.18563, we get somewhat close to that.

As another perspective, forecasting is a great capability domain to benchmark LM reasoning, calibration, pre-training knowledge, and more. 🧵1/n

account_circle
Jacob Steinhardt(@JacobSteinhardt) 's Twitter Profile Photo

Can we build an LLM system to forecast geo-political events at the level of human forecasters?

Introducing our work Approaching Human-Level Forecasting with Language Models!

Arxiv: arxiv.org/abs/2402.18563
Joint work with Danny Halawi, Fred Zhang, and John(Yueh-Han) Chen

Can we build an LLM system to forecast geo-political events at the level of human forecasters? Introducing our work Approaching Human-Level Forecasting with Language Models! Arxiv: arxiv.org/abs/2402.18563 Joint work with @dannyhalawi15, @FredZhang0, and @jcyhc_ai
account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

GPT-4 with simple engineering can predict the future around as well as crowds:
arxiv.org/abs/2402.18563
On hard questions, it can do better than crowds.

If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would…

GPT-4 with simple engineering can predict the future around as well as crowds: arxiv.org/abs/2402.18563 On hard questions, it can do better than crowds. If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would…
account_circle