Also interesting is how it's explanation of the deobfuscated code, although broadly correct in terms of goal, doesn't accurately describe the steps. Almost as if it's disregarding the code altogether and merely describing another implementation of "resizeImage".
Yup, I got some nice documentation (minor syntax error) for a Http package in Lean that doesn't exist. I asked about Lean4 and it helpfully explained that the non-existent package had moved from Lean to Std.
I’ve been toying around with ChatGPT for a few weeks now and I encountered a few situations in which ChatGPT was like 90% accurate at best. Things like suggesting snippets of configuration files or plugin research. It’s good to get an idea and get started somewhere, but I certainly cannot trust it blindly.
What I've been telling everyone is that you can not (should not) ask ChatGPT a question that you can not independently verify that answer to yourself.
This is kind of what makes it good for generating code, because everything it generates can be pretty quickly verified and validated by another machine (interpreter/compiler).
Makes it not so great for writing essays on books you didn't read, and especially for doing math you don't understand... because it can't do math AT ALL.
Let's hypothetically assume we have some sort of AGI and we can ask it to write programs and text and nothing else.
Is there anyone on this planet who would think that they don't need to look at the generated code? I mean imagine a manager simply feeding in tickets and getting a finished application out without ever knowing how it was produced.
The application is business critical and any kind of mistake could ruin his business which puts the manager at complete mercy of the AI.
Now you might say that this happens with humans as well but when humans cause problems we let other humans review and test their code.
AI causes problems? Let's add more humans. Wait a minute...
> everything it generates can be pretty quickly verified and validated by another machine
It can be verified in a sense that it builds, but that doesn't mean that it actually does what you asked it to do, or that it does it on all valid inputs. The worst bugs to track down are silent logic bugs.
For math, I'm kind of surprised that it can't recognize "this is math" and then handle that with normal calculations instead of the language model. I assume we'll see that before long.
I really want Wolfram|Alpha to be integrated into this... that'd be nice. Also if they could make W|A any faster than a glacier while there at it that'd be great.
A good trick is to ask it to translate the request into commands of your choosing. Like ask it to generate python code to make the calculation for example. Another thing that works well is to turn it into a command extraction problem, give it examples of the kinds of commands you want, and build an interpreter for those commands.
I agree, we’re not far from that, or we’re there now.
I'm leaning towards using it for things I already know exactly how to do -- including a very clear idea of the result. In these contexts, it can save some mental workload / time.
Don't think so. There's clearly the beginning of a while loop near the top of the obfuscated version. There's no loops at all in the 'de-obfuscated' version.
Here's GPT's own explanation what the purpose of that while loop is:
---
This code uses JavaScript's `eval` function to obfuscate the code by looping over an array of strings and passing them as arguments to `eval` to create a variable. It also uses an anonymous function to obfuscate the code. The code is deobfuscated by replacing the `eval` function and the anonymous function with their respective strings.
That explanation is also not correct! The while loop is an obfuscation gadget (of sorts), but it doesn't use eval, it uses push and shift to rotate the array. The only use of eval is 'eval("find")', which is not top grade obfuscation.
Yeah, pretty sure that loop is to modify the lookup table for strings in runtime so you can't just statically replace it in the code, it's not strong obfuscation but in a properly deobfuscated code that loop wouldn't exist
Yeah an example I was shown was python code to process some data. It was 30 lines of correct-looking trivial boilerplate code, except for one regex to do the actual processing. The regex was hopelessly wrong.
Clearly if you didn't know how to write the other 29 lines of code there's no way you are going to be able to debug the regex.
The optimistic way to look at it though is that it wrote the boring 29 lines that you didn't want to write and got you straight to the actual problem that needs solving.
We recently had an ad-hoc experiment like this as well "give us basic config management code to download a service, add a systemd service for it, deploy a config and setup reloading of the service"
And it had some funny mistakes in there - something called "Reload service XYZ" and it was actually a hard restart of the service, rather silly file locations and such, sure.
But at the same time, it saved us an hour or two of boilerplate setup and even dug up a somewhat smart way to validate the configuration for this very specific service. This allowed us to jump more into understanding the service, tuning the config and setting up good tests for the setup instead of the same boring 20 resources in a config management.
I guess I could also ask if we could have some better form of service or config management which eliminates this boilerplate... but ChatGPT made our current day-to-day work a little easier there.
Yes, and honestly I think this is the actual potential win here, especially in boilerplate-heavy languages (Java, I'm looking at you in particular). So if this turns out to be the case it could be good for programmer productivity while skewing the dev landscape towards tools, frameworks, languages etc that the prevailing AI models work well with.
It has this really amazing and terrifying quality of being a really good bullshitter. I asked it an AWS question once and it gave me 4 very convincing sounding answers. I went to try it. 2 of them are complete bullshit as in as the commands don't even exist. The only good answer is the one I already had. It's in this uncanny valley of bullshitting. Can be quite dangerous in some situations, especially if one is lolled into trusting it.
It often happens that ChatGPT will confidently give you something that _looks_ like what you're asking for despite it being awfully wrong - sometimes you can make it "understand" its mistake and correct it, sometimes not. It's usually not that far off, but trusting it blindly is just out of the question.
I was having ChatGPT give me wildly wrong answers and when I asked for a source it provided me fake websites and confidently quoted information from those sites that have never existed
It's good enough. I know zero powershell, but I know other languages enough to understand the common grammar. With ChatGPT I'm a fairly rapid powershell programmer right off the bat - as evidenced by the script I've been writing this afternoon. I don't know any of the (overcomplicated) syntax, but now I don't have to.
Is it ready for production? Maybe not. Is it amazing and inevitably going to get better? Yes. Does it make a lot of human labor redunant in the very foreseeable future? Also yes.
It's just like any other AI system, it returns results as a best effort proposition of accurate with a % confidence that doesn't map well to binary outcomes.
So yes, it can be accurate. But there are scenarios where it must be strict or binary correct, and its not great at that bit.
I’ve had it confidently tell me to use Python libraries that don’t exist, pass parameters to methods that aren’t in the method signature, and to write code that had to be debugged and fixed.
I’m still excited to use it, but you have to know enough about coding to ensure correctness. It’s no where near possible for a non-coder to build a complicated app with (so far).
It's been my experience that ChatGPT gets things wrong sometimes. It's also been my experience that if you say "X isn't working as expected." it will do its best to fix the issues. It usually does a pretty good job of fixing it.
I had it write a handful of scripts for me yesterday. It got about 90% of it right on the first pass and 99% of it right on the second pass. You still need to have some understanding of what you're doing so you can see when things are wrong but man if it doesn't save you a lot of time.
and then it generated something that looks like valid source based on that garbage.
is the source it spat out runable? then it is not the same program as the input and (any way you spin it) nothing has been deobfuscated but just dreamed about the prompt a bit and then shown you its dream diary notes. wake up boy, this is statistics, not a magic swiss army knife API.
Every commenter is jumping in asking “is it correct?” Even if it’s not 100%, if it’s at least reasonably close, it could be a tremendous force-multiplier against obfuscation for someone with some familiarity with roughly what the code is trying to do.
Correctness is a big deal here. This is a security context and we can assume that the obfuscators are active attackers against legible code, not just people passively hoping that their obfuscated code is obfuscated. If this becomes a popular technique, then code obfuscation tools will simply pivot to writing code that ChatGPT gets wrong when asked to unobfuscate it.
I can't even imagine that would be a particularly hard thing to do, especially if it isn't correct even before actively attacking ChatGPT! Fooling it even harder won't be terribly difficult. This is advantage attacker overall.
I imagine it would be as easy as using some cognitively loaded, but wrong, terms as variable names instead of short letters and numbers. Ask ChatGPT "please unobfuscate this network code" and get back a substring search algorithm because the network code was written with a dozen variants on "haystack" and "needle" for variable names, for instance.
ChatGPT being actively wrong would be a step back for such deobfuscators then, not a positive at all.
I don't really see this as a problem. Once you have a first cut deobfuscation from this you can refine it with other methods, like comparing input/output examples between the original and the deobfuscated version, or even use something more sophisticated like symbolic execution [1] or differential fuzzing [2] to systematically look for divergence between the behavior of the two. You could even feed these back in to ChatGPT and ask it to redo the deobfuscation given a failing test case.
Such testing won't be able to prove that the two are equivalent (unless it's exhaustive) but with decent coverage of the original you can get some good confidence. The goal of deobfuscation is usually understanding, so I'm not sure you need strong guarantees of perfect semantic equivalence with no human intervention/judgment.
And of course, existing deobfuscators have bugs and aren't guaranteed to preserve semantics either.
You seem to have just blipped by the deobfuscation from ChatGPT being actively wrong.
I meant what I said. I expect ChatGPT would happily output a substring search algorithm for the accept loop of an HTTP server if you just put enough "haystack" and "needle" words in the obfuscated code. How are you supposed to "refine" that into the truth?
To the extent that there is an answer, the answer is, completely ignore the ChatGPT output and use existing tools. Which is to say, ChatGPT would be worse than useless at that point.
I'm not saying ChatGPT will be slightly off, and maybe the obfuscator can kick it to be another 5 or 10% wrong. I'm saying, it is likely trivial to update the obfuscator to make ChatGPT utterly wrong, in every detail, up to and including the entire fundamental nature of the code.
When it gets it entirely wrong that will be trivially detected by an I/O example, no? So I don't see that as dangerous, just inconvenient (it sometimes doesn't work, but you know when it doesn't work). You can also use an existing semantics-preserving deobfuscator and then use that as the input to an LLM deobfuscator instead of the original.
If you're saying that obfuscators can eventually adapt, then sure. So can deobfuscators. This particular problem is kind of inherently an arms race.
A counter-counter measure could be to re-obfuscate the code with a traditional obfuscator/minifier, stripping off all the clever misleading tricks and run ChatGPT on that.
A lot of people are bought into the fantasy that AGI is just around the corner and don't understand that LLMs are a statistical model of words which possesses no reasoning ability.
Imagine a Stack Overflow where 10% of all answers look correct, but are actually completely fictional (as in, not misunderstanding the question, providing outdated information etc., but completely making up terminology and solutions) – without any cues to tell you which is which.
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
If you want to get the same (or similar) results:
curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "code-davinci-002",
"prompt": "Deobfuscate the following C code:\n\nint main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf(\"%.8s\\n\",(char*)&_);}",
"temperature": 0,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}'
That one loops on the response:
The code is obfuscated with the following rules:
* `B` is the value of `b`
* `I` is the value of `i`
* `T` is the value of `y`
* `S` is the value of `_`
Changing to: "model": "text-davinci-003",
int main(int b, char** i) {
long long n = b, a = *i[0] ^ n, r = (a / b & a) >> 4, y = atoi(*++i), _ = (((a ^ n / b) * (y >> 8) | y >> 7) & r) | (a ^ r);
printf("%.8s\n", (char*)&_);
}
Which... isn't exactly helpful.
Asking it to explain it instead:
This code is a C program that prints out a string. The program takes two parameters, b and i, and assigns them to variables n and a. Then the program performs various calculations on those variables and the parameters, such as shifting and bitwise operations. Finally, the program prints out the result of the calculations, which is stored in the variable _, as an 8-character string.
Wow, I wonder how many "bytes of state" ChatGPT must have internally to be able to do that transform. Or does it guess from certain sequences and just writes something plausible?
It would be interesting to test if it can solve "memory hard" problems, like repeated obfuscation or hashing, that would require large internal state. Then you could map how large the equivalent memory would be.
Also interested about its accuracy. I built a C/C++ obfuscator[0] a while back, I'll have to make a note to see how well ChatGPT deobfuscates its output.
According to Alex's own tweet, ChatGPT originally produced the obfuscated code too.
In my head, this is like asking someone to translate "Hello" into French and then asking them to translate "Bonjour" back to English. It proves nothing about capabilities or usefulness.
Especially given that the "obfuscated code" is not syntactically valid. Even if you repair the syntax, it contains a number of errors, and eventually descends into gibberish (although some of what it does is not bad!). There is nothing to deobfuscate here.
I asked ChatGPT to write me a fibonacci sequence generator suitable for entry into the obfuscated C contest. And it actually spit out something reasonably well obfuscated (used lots of macros).
It's not some magic bullet. It helps a ton with trying to give names to obfuscated function and variable names, but you have to be intelligent enough to know what the code's actually doing. It probably helps RE teams a lot, but until it can easily run across an entire codebase it's just another tool in the toolbox.
Agreed its can lacking logic till to spell it out, example was this ctf challenge and it just could understand the hash collision till I gave it the full write up [0]
w.r.t codebases I may look at some of the free models (as this gets around the cost problem) and try to feed it prompts as a block of code plus meaningful references to same under the token limit.
From my own research, ctf tools like angr can build AST trees, so I'm working on the thinking I can train the AI to review interesting parts of the execution tree. happy to get feedback or papers since this has been the most interesting find so far
https://arxiv.org/abs/1906.12029
I previously did this. I wrote a naive integer factorization program in C, compiled it, extracted the disassembly and intentionally broke it.
It generated a working c function they was almost correct given the broken assembly. I then “talked” with it to improve the code, even suggested that the original disassembly contained an error. It was surprisingly good.
Note: I broke the disassembly intentionally because when I presented the original disassembly it immediately outputted the/a C program to factorize integers.
I have used ChatGPT somewhat successfully to decompile assembly in to C and C++. It's making a lot of mistakes but despite all of this, it's very helpful.
I have to say, I find all the comments dismissing ChatGPT hilarious. I read them in a funny grandpa voice. However, we should look past the insignificant details. The main achievement is that we now have a really capable unstructured text-to-computer interface. We can hook it up to anything and it will give us answers with whatever properties we desire, in whatever shape we can think of.
But they're not small flaws if you're relying on it to eg replace a person's job.
If i ask it for the dimensions of a product and it gives the wrong figures it should just tell me it doesn't know instead of inventing something.
That's the problem. It doesn't tell you when something is wrong so you can never trust if it's right unless you happen to know the field. That makes it far less useful.
I've had a few instances where it returned bad code or was unable to solve a challenge, however most were fixed by better prompts, or by clarifying prompts. In a way I think there is a two-way "learning" process going on here. I'm training it how to give me what I ask for, and it trains me how to ask for what I want it to give me.
The tricky bit IMO is when you’re at the threshold of being able to identify errors it makes. I tested some situations a while ago where I asked for some physics calculations functions. I’m an experienced programmer but haven’t really done anything with physics since high school 20 years ago. The code returned looked plausible and would run, but going through it line by line and looking up the real formulas it turned out to be super wrong.
You've conflated the critical detail. No garbage-out is the critically significant problem with AI. Right now, ChatGPT is the ability to generate structured results from whatever format we can think of to whatever format we can think of. They may or may not be correct.
It is only a critically significant problem for some applications. People have already come up with a variety of potential solutions for this problem, for different classes of problems, including integration with Wolfram Alpha.
However, verifying results is and will always be important. Iterating on the prompts and alternative paths is also important.
But even as it is right now, it is very useful to a lot of people. Remember, a lot of people accepted self-driving cars that can drive themselves off roads and crash into trailers, even paid extra for it, while this is just text generation for now.
I see that it is going to transform everything. No technical revolution has ever been more readily apparent. We now have an interface to talk to computers that can translate unstructured raw language into other formats. This is the real innovation that will unlock the power of more specialized, more advanced models in the near future.
What escapes a lot of people is that ChatGPT was a battle won in the UI/UX real first and foremost. GPT3 was old news, but throw a new paradigm and interface on it and here we are.
Incorrect. ChatGPT is markedly more capable than GPT3 because of its improved model (instructgpt and supervised fine tuning). Anyone who has played with both can attest to that.
The response was generated by ChatGPT with a prompt "make a snarky response". I just found it amusing to use the technology itself against its own proponent who insinuated that everyone having something against the technology is somehow a tech averse "grandpa".
So, all I've learned is that ChatGPT knows the obfuscated and de-obfuscated versions of code that it itself has generated.