> GPT-3 struggles with large numbers, decimal numbers, and negative numbers. Whe...

visarga · on March 29, 2022

There are many papers trying to couple language models with external modules.

In the Retrieval-Enhanced Transformer (RETRO) paper a large language model was coupled with a similarity based text index. It can populate the prompt with relevant information from the index thus being more grounded and update-able.

In another paper (AlphaCode) the language model was coupled with a compiler and could run programs and check if they match the expected outputs for a few test cases. The model was able to solve competition style coding problems above average human score.

In another paper (Language Models as Zero Shot Planners) a language model generates commands to navigate a virtual home environment and performs tasks. The knowledge in the LM helps in quickly learning tasks.

A recent one can learn new concepts by simple conversation, then apply them where necessary. You can talk-train your model. (Memory assisted prompt editing to improve GPT 3 after deployment)

So the trend is to add "toys" on language models - a simulator, a compiler, a search engine, a long term memory module.

I'd like to see a recursive language model, that can sub-call itself to decompose problems.

gwern · on March 29, 2022

You forgot all the inner monologue (https://www.gwern.net/docs/ai/gpt/inner-monologue/index) & scratchpad papers which give it additional steps or access to Python REPL etc: eg https://arxiv.org/abs/2112.15594 https://arxiv.org/abs/2111.08267 https://arxiv.org/abs/2111.08171

visarga · on March 29, 2022

AI Chains really takes it to the next level.

gwern · on March 30, 2022

Yeah, but I didn't bring it up because I wasn't sure how much is really the model choosing and how much is the human workflow: they emphasize the interactive part heavily.

Anyway, today another great paper dropped on self-distillation: "STaR: Bootstrapping Reasoning With Reasoning" https://arxiv.org/abs/2203.14465 , Zelikman et al 2022.

sprobertson · on March 30, 2022

> I'd like to see a recursive language model, that can sub-call itself to decompose problems.

I tried a very simple and specific version of this a few years ago (Recursive Application of Recurrent Neural Networks) and it worked great for intent parsing: https://github.com/spro/RARNN

Would like to see what "real" researchers with more modern models could do with the concept.

josefx · on March 30, 2022

> The model was able to solve competition style coding problems above average human score.

I am not sure if I am thinking of the right study, but as far as I remember the model included a human wading through and filtering solutions and while there may have been a compiler attached they also scored themselves. The marketing blurb of course tried to make it sound as if they had competed.

visarga · on March 31, 2022

The model generates a large number of solutions, then they filter those that actually compile and generate the right output when executed, then they cluster to select a few (<10 solutions) and submit them. They are not allowed to present too many attempts.

Here's a good analysis of the paper: https://www.youtube.com/watch?v=s9UAOmyah1A

josefx · on April 1, 2022

Ah, the paper describes a fixed method for the last selection step and also AI generated tests to reduce the results even more before that. Quite a bit better, even if the participation is still only simulated.

The_rationalist · on March 30, 2022

> A recent one can learn new concepts by simple conversation, then apply them where necessary. You can talk-train your model.

Which model? Sauce please

visarga · on March 30, 2022

"Memory assisted prompt editing to improve GPT 3 after deployment"

paper: https://arxiv.org/abs/2201.06009

video: https://www.youtube.com/watch?v=gYxJEd3EUKs

ravi-delia · on March 29, 2022

I believe the dominant thinking is that GPT-3 has trouble with math because it doesn't see individual digits. It obviously has no trouble working on words, which are much more discreet than numbers. I wouldn't be surprised if it had trouble carrying a long equation though. When writing it can reconsider the whole context with each new word, externalizing that memory, but with most computations it would have to carry out the whole thing in one go. That's a lot of dedicated parameters for a single subtask.

bertday · on March 29, 2022

Even the tokenization is wonky. Imagine if you had no concept of math characters and instead has a lookup table of common-ngrams (BPE encoding). For example, the binary addition function “a+b” may be tokenized as a unary “3+b” because “3+b” occurs commonly. That tokenization is vastly different from “3.00000001+b”. GPT has to invert this tokenization artifact with finite training data.

ravi-delia · on March 29, 2022

Yeah, I think that's the most accepted explanation. Everything after my first sentence was total speculation, the tokenization is usually cited as the issue.

thrtythreeforty · on March 29, 2022

> with most computations it would have to carry out the whole thing in one go

Is there a way to allow models to say "let me think about this some more"? With language models like GPT-3 you emit one token per inference iteration, with its previous output fed back in as input/state. Can models opt out of providing a token, but still update state? That would allow it to break up the computation into discrete steps.

thesz · on March 30, 2022

Here it is: https://arxiv.org/abs/1611.06188

RNN outputs "confidence" bit which can guide computation to perform more steps to obtain more confidence in the result. Essentially, RNN asks "let me think about that some more".

But, separate ablation study found that if you just drop confidence bit altogether and allow RNN to compute some more every time (e.g., always perform 4 computations on single input for 1 output), you get same or better results without extra complexity of training.

There is also Microsoft Research's paper I can't find right now about the variable computation for image classification where there is a "confidence" bit at some of the final layers - if lower layer is cinfident enough, it's output will be used for classification, otherwise the output of that layer will be passed into more transformation of upper layers.

rablackburn · on March 30, 2022

> But, separate ablation study found that if you just drop confidence bit altogether and allow RNN to compute some more every time (e.g., always perform 4 computations on single input for 1 output), you get same or better results without extra complexity of training.

Do they saw what happens if you do both? Perhaps the “benefit from a higher computation/per cycle” phenomena and the “benefit from signalling relative computation resource allocation” one are different.

I guess I’ll have to try and read the paper, but I’m new to the literature and am clueless about the current state of research.

durovo · on March 29, 2022

I believe GPT-3 has a transformer-based architecture. So it doesn't recursively ingest it's own output in each iteration. I believe attention-based transformer models have enough complexity to be able to learn what you are talking about on their own.

ravi-delia · on March 30, 2022

GPT-3's transformers only recur some finite amount. Attention does a lot compared to a bog standard RNN, and probably if the numbers were tokenized it would be enough for most reasonable computations, but eventually you definitely would hit a cap. That's probably a good thing, of course. The network and training are Turing complete together, but it would suck if the network itself could fail to terminate.

thrtythreeforty · on March 29, 2022

Thank you for pointing out the difference. I went and reread about transformers; previously I thought they were a kind of RNN. (I am not an ML engineer.)

zaptrem · on March 30, 2022

That would be neat. You could give it backspace and "let me think more" tokens that would signal the inference program to run it again on the prompt plus its own output. That way it could generate "thoughts thoughts thoughts [THINKMORE] thoughts thoughts thoughts [THINKMORE] [BACKSPACE]X 8 (The real output would go here""

It would of course have to be penalized in some way for [THINKMORE]ing to avoid infinite processing time. It would have to learn to reason at what point diminishing returns would kick in from continuing to [THINKMORE] VS recording its best answer. The penilization function would have to take into account remaining tokens that would fit in the transformer prompt.

ravi-delia · on March 29, 2022

I think it would work, but backprop would be computed in a different way every time. I'm not an expert, so there may be sneaky ways around it, but I'm pretty sure you'd lose out on a long history of little efficiency improvements when you could just make it more recurrent instead.

PeterisP · on March 30, 2022

Hardcoding a tokenization tweak that keeps individual digits separate would be a trivial change to the preprocessing that would not affect the rest of the model training process.

edflsafoiewq · on March 29, 2022

Can it do math on "prose numbers", eg. "two thousand three hundred and four"?

ravi-delia · on March 30, 2022

Not super well in the GPT-2 based models I have access to. It falls into different error modes though, diving into prose rather than even making an attempt. Makes sense in retrospect!

daniel-cussen · on March 29, 2022

And that's where you see the man behind the curtain.

AitchEmArsey · on March 29, 2022

Next year: GPT-NG offloads it's answers to Amazon Mechanical Turk, and we've come full circle.

daniel-cussen · on March 29, 2022

Yeah for sure. With energy prices soaring, Moore's law being morally over for since 2010, wages being so completely destroyed by the hatred Democrats have for them, and the sneaky little misconceptions and errors the golem's makers did not fight hard enough to let in, AI will be supplanted by plain I.

renonce · on March 30, 2022

Check out my project https://github.com/Thopliterce/transformer-arithmetic. This is a concrete implementation based on GPT-2 model that does multiplication accurately, digit by digit. It does so by generating a dataset that tells the model how to do multiplication step by step. Doing arithmetic actually works with just GPT-2, without an oracle.

rablackburn · on March 30, 2022

Call it the uncanny valley, but I find this mildly disturbing… and absolutely fascinating.

Iv · on March 30, 2022

That's actually pretty straightforward: (Tested with EleutherAI GPT-J-6B because why use a closed model when an open one exists?)

Prompt: "Question: Solve three plus six.

Answer:

a=3

b=6

a+b

Question: Solve twelve times fifteen. Answer: a="

And the model dutifully answered:

"a=12

b=15

a*b"

Which you could feed directly to a python console.

This kind of approach, where you make a long prompt to make the model understand the kind of result you want is named "prompt engineering" and I find it crazy how close we get to robopsychology.

woeirua · on March 30, 2022

Well, the theory around neural nets strongly suggests that enough nonlinear activation functions combined in the right way should be able to learn any function, including basic arithmetic. Now, whether or not you have the right approach to training the network to get the right set of weights is a different story...

ironmagma · on March 30, 2022

Any computable function I assume? I wonder what other limitations there might be.

emmelaich · on March 29, 2022

An intriguing thought is that a GAI will behave very much like a well-read smart individual. With the faults, mystery and foibles that implies.

tluyben2 · on March 30, 2022

A well read smart human won’t guess things; they will look it up, find software to get the correct answer (like a calculator) or refer to a colleague.

emmelaich · on March 30, 2022

If they have enough time.

mahastore · on March 30, 2022

From the article it seems GPT produces the correct output when the instruction is as follows:

# Instruction def f(x): if x > 30: return "too large" else: return x + 3

How the hell it is different from the programmer writing the python function herself and where exactly is the "intelligence" in this?

rnk · on March 30, 2022

People are asking this question to see if it has an evaluator like wolfram alpha.

tluyben2 · on March 30, 2022

The point of the article is that gpt3 can run code?

hoseja · on March 30, 2022

It's bad at math in a similar way brains are, the hell.