As someone who has a deeper knowledge of programming rather than math, I find th...

spi · on Jan 1, 2024

Sharing my experience here. My background is in math (Ph.D. and a couple of postdoc years) before switching to practitioner in deep learning. This year I taught a class at university (as invited prof) in deep learning for students doing a masters in math and statistics (but with some programming knowledge, too).

I tried to present concepts in an as reasonably accurate mathematical way as possible, and in the end I cut through a lot of math in part to avoid the heavy notation which seems to be present in this book (and in part to make sure students could spend what they learnt in the industry). My actual classes had way more code than formulas.

If you want to write everything very accurately, things get messy, quickly. Finding a good notation for new concepts in math is very hard, something that gets sometimes done by bright minds only, even though afterwards everybody recognizes it was “clear” (think about Einstein notation, Feynman diagrams, etc., or even just matrix notation, which Gauss was unaware of). If you just take domain A and write in notations from domain B, it’s hard to get something useful (translating quantum mechanics to math with C* algebras and co. was a big endeavour, still an open research field to some extent).

So I’ll disagree with some of the comments below and claim that the effort of writing down this book was huge but probably scarcely useful. Who can read comfortably these equations probably won’t need them (if you know what an affine transformation is, you hardly need to see all its ijkl indices written down explicitly for a 4-dimensional tensor), and the others will just be scared off. There might be a middle ground where it helps some, but at least I haven’t encountered such people…

layer8 · on Jan 1, 2024

Mathematical notation is more concise, which may take some getting used to. One reason is that it is optimized for handwriting. Handwriting program code would be very tedious, so you can see why mathematical notation is the way it is.

Apart from that, there is no “the code” equivalent. Mathematical notation is for stating mathematical facts or propositions. That’s different from the purpose of the code you would write to implement deep-learning algorithms.

tnecniv · on Jan 2, 2024

The last part was a big hurdle for me as an early undergrad. I was a fairly strong programmer toward the end of high school, and was trying to think of math as programming. That worked for the fairly algorithmic high school stuff and I got good grades, but it made I was awful at writing proofs. I also went through a phase where I used all the logical notation and rules to manipulate it possible in order to make proofs more algorithmic to me, but that both didn’t work well for me and produced some downright unreadable results.

Mathematical notation is really a shorthand for words, like you’d read text. The equals sign is literally short for equals. The added benefit, as others have pointed out, is that a good notation can sometimes be clearer than words because it makes certain conclusions almost obvious. You’ve done the hard part in finding a notation that captures exactly the idea to be demonstrated in its encoding, and the result is a very clean manipulation of your notation.

HybridCurve · on Jan 2, 2024

This is essentially my problem. I started writing programs at a young age and was introduced (unknowingly) to many more advanced mathematical concepts from that perspective rather than through pure mathematics. What was it that helped break this paradigm for you?

tnecniv · on Jan 2, 2024

Really trial and error and grinding through proofs. Working through Linear Algebra Done Right was a big a-ha moment for me. Since I was self-studying over the summer (to remedy my poor linear algebra skills), I was very rigorous in making sure I understood every line of the proofs in the text and trying to mimic his style in the exercises.

In hindsight, I think the issue was trying to map everything to programming is a bad idea and I was doing it because programming was the best tool in my tool chest. It was a real “when all you have is a hammer, everything looks like a nail” issue for me.

j2kun · on Jan 2, 2024

I wrote a book: https://pimbook.org

You might find it useful for your situation. The PDF is pay-what-you-want if you don't feel like paying for it.

HybridCurve · on Jan 2, 2024

Ah, I think I remember bookmarking this when it was posted before. You really don't have to go very far in computing to find a frontier where most everything in described pure mathematics and so it becomes a substantial barrier for undiversified autodidacts in the field. The math in these areas can often be quite advanced and difficult to approach without the proper background and so I appreciate anyone who has made taken the time to make it less formidable to others.

light_hue_1 · on Jan 2, 2024

I would suggest something like https://ocw.mit.edu/courses/6-042j-mathematics-for-computer-... instead of that book.

I appreciate that some may find the book useful, but I personally don't agree with the presentation. There are too many conceptual errors in the book that you need to unlearn to make progress. For example, the book describes R^2 as a "pair" of real numbers. This is very much untrue and that kind of thinking will lead you even further astray.

I say this as someone with a math/cs degree and PhD having taught these topics to hundreds of students.

woolion · on Jan 2, 2024

>For example, the book describes R^2 as a "pair" of real numbers.

I naturally auto-corrected this "(the set of) pairs of real numbers". If that's the case, then I don't see how this differs from the actual definition. What is the conceptual error? Is it the missing 'set of'?

j2kun · on Jan 3, 2024

> For example, the book describes R^2 as a "pair" of real numbers.

From page 15:

> The one piece of new notation is the exponent on R^2. This means "pairs" of real numbers.

Your interpretation of this quote is uncharitable at best. Using it to make a blanket assertion about the book is just silly, and quite out of the spirit of mathematics.

In particular, page 19 has an example of the kind of things that my book has that other books don't: a discussion of the soft skills of learning math and the cultural acclimation process:

> Though it sometimes makes me cringe to say it, give the author the benefit of the doubt. When things are ambiguous, pick the option that doesn’t break the math. In this respect, you have to act as both the tester, the compiler, and the bug fixer when you’re reading math. The best default assumption is that the author is far smarter than we are, and if you don’t understand something, it’s likely a user error and not a bug. In the occasional event that the author is wrong, it’s often a simple mistake or typo, to which an experienced reader would say, “The author obviously meant ‘foo’ because otherwise none of this makes sense,” and continue unscathed.

The course you suggested is the sort of "grab bag of topics" course, meant to cram the basics of every topic a CS major might want to know for doing the kind of CS theory research that MIT cares about. If you find math hard, I doubt that will make it much easier, but it could be good to do alongside a book like mine if you find my book too easy.

bawolff · on Jan 2, 2024

I don't know that has anything to do with programming.

Arithmatic and writing proofs are very different skills. There is going to be a gap for everyone.

tnecniv · on Jan 2, 2024

Yeah I know it’s a common challenge. I think it took me a bit longer than some of my peers because I was trying to force it to be like something I knew instead of meeting it on its own terms.

When all you have is a hammer and all that

Koshkin · on Jan 2, 2024

> Mathematical notation is for stating mathematical facts or propositions.

And as such it is way too often abused. Because the (original, and the most useful) purpose of mathematical notation is to enable calculation, i.e., in a general sense, to make it possible to obtain results by manipulating symbols according certain rules.

layer8 · on Jan 2, 2024

I see the steps of a calculation as stating a sequence of mathematical facts, so that’s just an instance of the general definition.

Koshkin · on Jan 2, 2024

Sure, but the whole point is to avoid the need to do that! Manipulating symbols is the way to automate reasoning, i.e. to get to a result while completely ignoring said "facts." Using the symbols to merely "state the facts" is abuse (of the reader, mostly).

tnecniv · on Jan 1, 2024

So this is a book written by applied mathematicians for applied mathematics (they state in the preface it’s for scientists, but some theoretical scientists and engineers are essentially applied mathematics). As a result, both the topics and the presentation are biased towards those types of people. For example, I’ve never seen in practice worry about the existence and uniqueness conditions for their gradient-based optimization algorithm in deep learning. However, that’s the kind of result those people do care about and academic papers are written on the topic. The title does say that this is a book on the theoretical underpinnings of the subject, so I am not surprised that it is written this way. People also don’t necessarily read these books cover-to-cover, but drill into the few chapters that use techniques relevant to what they themselves are researching. There was a similarly verbose monograph I used to use in my research, but only about 20-30 pages had the meat I was interested in.

This kind of book is more verbose than my liking both in terms of rigor and content. For example, they include Gronwall’s inequality as a lemma and prove it. The version that they use is a bit more general than the one I normally see, but Gronwall’s inequality is a very standard tool in analyzing ODEs and I have rigorous control theory books that state it without proof to avoid clutter (they do provide a reference to a proof). A lot of this verbosity comes about when your standard of proof is high and the assumptions you make are small.

aurareturn · on Jan 2, 2024

Are there any books you recommend for deep learning that are written for developers who don't use math every day?

I suppose the goal would be to understand deep learning so that we know enough of what's going on but not to get stuck in math concepts that we probably don't know and won't use.

stefr- · on Jan 2, 2024

I am/was in this scenario. I'm sure there are other resources out there specifically aimed at developers, but a book I'm reading now is "Deep Learning From Scratch" by Seth Weidman. He takes a different approach, by explaining concepts in three distinct methods: a mathematical way, by using diagrams and by showing the code.

I like this approach because it allows me to connect the math to the problem, whereas otherwise you wouldn't have.

In the book, you're slowly creating a DL framework, as the title says, from scratch. He also has all the code on GitHub: https://github.com/SethHWeidman/DLFS_code

I think if you are truly trying to understand deep learning, you will never get to avoid the math because that's really what it is at it's core, a couple of (non-linear) functions chained together (obvious gross oversimplification).

lr1970 · on Jan 2, 2024

The last commit in the repo of "Deep Learning from Scratch" was 5 years ago. It is hopelessly outdated. The field is changing very fast.

aabajian · on Jan 1, 2024

All three authors are PhDs or PhD-candidates in mathematics. The notation is extremely dense. I'm curious who their target audience of "students and scientists" are for this book.

tnecniv · on Jan 2, 2024

Likely graduate students with a very theoretical interest. Some theoretically-oriented scientists and engineers are also basically applied mathematicians. It is presumably targeted at people that want to further develop the theoretical aspects of learning, as opposed to applied practitioners

angra_mainyu · on Jan 1, 2024

I had a bunch of classes in undergrad (physics) that had basically the same notation and style.

ivancho · on Jan 2, 2024

I have a strong mathematical background, and I found the notation completely insane. Right out of the gate in chapter 1 we get a definition that has subscript indices in the subscript index and a summation with subscripts in the superscript, and then composed in a giant function chain. Later we get to 4-level subscripts deep, invent at least 3 new infix operators, define 30 new symbols from 3 different alphabets and we're barely at page 100 out of 600. I have no idea who is supposed to follow and digest this

chongli · on Jan 2, 2024

I’m not sure what specialization of math you studied, but using superscripts for indices is pretty common where you’re dealing with multi-dimensional objects. I used it in a lot of the courses in my degree.

ivancho · on Jan 2, 2024

I have no problem with superscripts. Here are a couple of examples of what I am talking about:

  \left(\Psi_{L} \circ \mathcal{A}_{l_{L}, l_{L-1}}^{\theta, \sum_{k=1}^{L-1} l_{k}\left(l_{k-1}+1\right)} \circ\right. & \Psi_{L-1} \circ \mathcal{A}_{l_{L-1}, l_{L-2}}^{\theta, \sum_{k=1}^{L-2} l_{k}\left(l_{k-1}+1\right)} \circ \ldots \\
  & \left.\ldots \circ \Psi_{2} \circ \mathcal{A}_{l_{2}, l_{1}}^{\theta, l_{1}\left(l_{0}+1\right)} \circ \Psi_{1} \circ \mathcal{A}_{l_{1}, l_{0}}^{\theta, 0}\right)

and

  x_{\mathcal{L}(\Psi)+k-1} & =\mathfrak{M}_{a \mathbb{1}_{(0, L)}(\mathcal{L}(\Psi)+k-1)+\mathrm{id}_{\mathbb{R}} \mathbb{1}_{\{L\}}(\mathcal{L}(\Psi)+k-1), \mathbb{D}_{k}(\Phi)}\left(\mathcal{W}_{k, \Phi} x_{\mathcal{L}(\Psi)+k-2}+\mathcal{B}_{k, \Phi}\right)

and sure, I can figure it out, but you have to agree there are some readability issues

tsimionescu · on Jan 2, 2024

They are not complaining about superscripts for indices, but about having a subscripts in those superscripts. Basically like x² but the ² has a subscript of its own. That is very dense and graphically hard to follow as notations go.

outrun86 · on Jan 1, 2024

I’m just wrapping up a PhD in ML. The notation here is unnecessarily complex IMO. Notation can make things easier, or it can make things more difficult, depending on a number of factors.

angra_mainyu · on Jan 1, 2024

Really? Coming from physics (B.Sc only) the notation is refreshingly familiar and straightforward. My topology and analysis classes were basically like this.

In fact, this pdf is literally the resource I've been searching for as many others are far too ambiguous and handwavey focusing more on libraries and APIs than what's going on behind the scenes.

If only there were a similar one for microeconomics and macroeconomics, I'd have my curiosity satiated.

youainti · on Jan 1, 2024

As a PhD econ student, the mathematics just comes down solving constrained optimization problems. Figuring out what to consider as an optimand and the associated constraints is the real kicker.

tnecniv · on Jan 1, 2024

It depends on what you’re doing. That is accurate for, say, describing the training of a neural network, but if you want to prove something about generalization, for example (which the book at least touches on from my skimming), you’ll need other techniques as well

angra_mainyu · on Jan 1, 2024

If you're referring to micro/macro, I meant more like a mathematical introduction to the models.

I recall giving Mankiw a try and wished I could just find a physics-style textbook as I found it way too wordy.

zwaps · on Jan 2, 2024

Most economists (who write these sort of textbooks) have some sort of math background. The push to find the most general "math" setting has been an ongoing topic since the 50's and so you can probably find what you are looking for. It's not part of undergraduate textbooks since adding generality gives better proofs but often adds "not that much" to insight. Nevertheless, the standard micro/macro models are just applications of optimization theory (lattice theory typically for micro, dynamical systems for macro). Game theory (especially mechanism design) is a bit of different topic, but I suppose that's not what you are looking for.

E.g., micro models are just constrained optimization based on the idea of representing preference relations over abstract sets with continuous functions. So obviously, the math is then very simple. This is considered a feature. You can also use more complex math, which helps with certain proofs (especially existence and representation).

You could grab some higher level math for econ textbooks, which typically include the models as examples, where you skip over the math.

For example, for micro, you can get the following: https://press.princeton.edu/books/hardcover/9780691118673/an... I think it treats the typical micro model (up to oligopoly models) via the first 50 or so pages while explaining set theory, lattices, monotone comparative statics with Tarski/Topkis etc.

QuesnayJr · on Jan 2, 2024

Debreu's Theory of Value

outrun86 · on Jan 1, 2024

Bishop’s Pattern Recognition and Machine Learning is one example that has tremendous depth and much clearer notation. Deep Learning by Goodfellow et al. is another example, albeit with less depth than Bishop.

I’m glad you’re enjoying the book. The approach is ideal for a very small subset of the ML population, no doubt that was their intention. I’m just weighing in that it’s entirely possible to cover this material with rigour yet much simpler notation. Even as someone who could parse this I’d go with other options.

jeffhwang · on Jan 2, 2024

Thanks for highlighting Bishop to me! I've self-taught through various resources esp. Goodfellow et al 2016. It's taken me a number of years to rebuild my math knowledge so that I feel comfortable with Goodfellow's treatment and look forward to learning from the Bishop book. Fwiw, I've found the math notation in the Goodfellow textbook to be among the best I've ever seen in terms of consistency and clarity. Some other books I enjoy, for example, do not seem to make any typographic indication of whether an object is a vector, scalar, or other. :(

p1esk · on Jan 2, 2024

FYI, Bishop just released an updated DL book: https://www.bishopbook.com/

HybridCurve · on Jan 2, 2024

I appreciated the notation in Goodfellow book as well, it was easy enough for me to follow without having a strong mathematics background. I'll agree however with others that this text is instead focused for a different audience and purpose.

t_mann · on Jan 2, 2024

Re your question on economics books, I think Advanced Macroeconomics by David Romer could fit your bill. It goes a lot into why the math is the way it is (arguably more interesting, like another poster said). Modern macroeconomics is also built on microeconomics, and to that extent it's covered in the book, so you're sort of getting two-for-one here.

ceh123 · on Jan 1, 2024

As someone that’s in the later stages of a PhD in math, given the title starts with “Mathematical Introduction…”, the notation feels pretty reasonable for someone with a background in math.

Sure I might want some slight changes to the notation I found skimming through on my phone, but everything they define and the notation they choose feels pretty familiar and I understand why they did what they did.

Mirroring what someone else said, this is exactly the kind of intro I’ve been looking for for deep learning.

godelski · on Jan 2, 2024

Is it fair to call something an introduction if it uses math from an upper division undergrad math criteria? Such as metric theory. My opinion is that it is context driven. E.g. Introduction to Differential Geometry or Introduction to Homotopy Theory. But I think you can't look at the title and infer prerequisites that are within the ballpark. I'd wager most people outside math and some physics students are familiar with Galerkin methods (maybe a handful of engineers) at the undergraduate level. I don't think most outside math and physics even learn PDEs (my engineering friends mostly didn't and my uni's CS program doesn't even require DE).

WhitneyLand · on Jan 2, 2024

What percent of LLM knowledge requires proficiency in anything you mentioned?

From what I’ve seen it’s a small percentage, and there’s no reason for most people to be put off by it.

Everyone come on in the water is fine.

godelski · on Jan 2, 2024

Between 0% and idk 70%? depending on what you're doing.

WhitneyLand · on Jan 2, 2024

Looking at the theory as a whole it’s a very small minority.

I’m trying to think if it’s 0 percent outside of backprop…

Arguably high school math gets you quite a bit of understanding. After that in descending order I’d guess Linear Algebra, Statistics/Probability, Basic Calculus, Partial Derivatives…

In other words it’s not all or nothing. The easiest stuff gets you a lot of bang for your buck.

godelski · on Jan 2, 2024

Are you a researcher in ML? What is your focus? I'm in image synthesis/explicit density modeling.

conformist · on Jan 1, 2024

Yes, it's easier for mathematicians, because a lot of background knowledge and intuition is encoded in mathematical conventions (eg "C(R)" for continuous functions on the reals etc...). Note that this is probably a book for mathematicians.

strangedejavu2 · on Jan 1, 2024

It's not too difficult to understand, but this introduction isn't written with pedagogy in mind IMO

HybridCurve · on Jan 2, 2024

This is the probably the most succinct explanation, and as an experienced perl developer, I admire your brevity.

joshuanapoli · on Jan 1, 2024

Mathematical notation usually has a problem with preferring single-letter names. We usually prefer to avoid highly abbreviated identifier names in software, because they make the program harder to read. But they’re common in Math, and I think that it makes for a lot of work jumping back and forth to remind oneself what each symbol means when trying to make sense of a statement.

tsimionescu · on Jan 2, 2024

I think the main difference is that in programming you typically use names from your domain, like "request" or "student". But math objects are all very abstract, they don't denote any domain. For example, if I have a triangle and I want to name its vertexes so I can refer to them later, what would be a good name? Should I call them vertexA, vertexB, and vertexC just so it's not a single letter?

WhitneyLand · on Jan 1, 2024

Use ChatGpt.

Screenshot the math, crop it down to the equation, paste into the chat window.

It can explain everything about it, what each symbol means, and how it applies to the subject.

It’s an amazing accelerator for learning math. There’s no more getting stuck.

I think it’s underrated because people hear “LLM’s aren’t good at math”. They are not good at certain kinds of problem solving (yet), but GPT4 is a fantastic conversational tutor.

godelski · on Jan 2, 2024

Don't suggest this. While I agree it can be helpful, the problem is if you're a novice you won't be able to distinguish hallucinations. Which in my experience are fairly common, especially as you do advance topice. If you got good math rigor then it's extremely helpful, because often things are hard to exactly search, but it's a potential trap for novices. But if you have no better resource, then I can't blame anyone, just give a warning to take care.

WhitneyLand · on Jan 2, 2024

That’s kind of like telling people not to go online because you can’t believe everything you read on the Internet.

What proportion of the problems you’ve encountered were with the free version vs premium? It’s a huge difference and the topic here is GPT4.

Also since it is fairly common for you are there any real world examples you can share?

godelski · on Jan 2, 2024

> That’s kind of like telling people not to go online because you can’t believe everything you read on the Internet.

Uhhh... it's like telling people to trust SO over reddit, especially a subreddit known to lie.

> What proportion of the problems you’ve encountered were with the free version vs premium? It’s a huge difference and the topic here is GPT4.

Both. Can we stop doing this? This is a fairly well established principle with tons of papers written about it, especially around math. Just search arxiv, there's a new one at least every week

WhitneyLand · on Jan 2, 2024

I’ll take that as it happens so infrequently with GPT4 you have no illustrative prompts that can be shared.

There have not been tons of papers written about this.

You seem to be conflating papers about GPT4 as a solver with it as a math tutor. It’s a completely different problem space.

godelski · on Jan 3, 2024

Or you can check the front page:

https://news.ycombinator.com/item?id=38845878

(older but similar) https://news.ycombinator.com/item?id=37904047

WhitneyLand · on Jan 3, 2024

I don’t get the relevance those seem to be security related?

My main point consistently has been that GPT4 can be an invaluable resource specifically for learning math subjects.

I am not aware of any papers, studying people using it as a conversational tutor for learning math and having problems with hallucinations.

godelski · on Jan 3, 2024

> I don’t get the relevance those seem to be security related?

And?

> My main point consistently has been that GPT4 can be an invaluable resource specifically for learning math subjects.

This can also be true. I use it a lot. Don't confuse openly discussing limitations with calling it a pile of shit. No need to have only two extremes.

> I am not aware of any papers, studying people using it as a conversational tutor for learning math and having problems with hallucinations.

Very bad faith requirement. Unless you have good evidence that GPT hallucinates in many domains (as exemplified by said security report) and NOT math tutoring. If you have this really strong evidence that math tutoring is specifically unique then I suggest writing a paper. I'll help if you really can do it and be happy to give you first author and be proven wrong. But a much easier explanation is that math tutoring is not unique to GPT with regards of generating hallucinations. If you truly believe you do need a extremely specific example, you may need to pull the wool off your eyes. But I'm hoping you don't and are just arguing.

CamperBob2 · on Jan 2, 2024

It works better than you think, as long as you use GPT 4. See my answer to the other person (https://news.ycombinator.com/item?id=38837646).

A lot of negativity comes from people who goofed around with 3.X for a while, came away unimpressed, muttered something under their breath about stochastic parrots or Markov chains that sounded profound (at least to them), and never bothered to look any further. 4 is different. 4 is starting to get a bit scary.

The real pedagogical value comes when you try to reconcile what it tells you about the equations with the equations themselves. Ask for clarification when something seems wrong, and there is an excellent chance it will catch its own mistakes.

godelski · on Jan 2, 2024

That answer isn't very compelling as it is one of the most well known equations in ML. There are some very minor errors but nothing that changes the overall meaning. But you even seem to agree with me in your followup: don't rely on it, but use it. I'm only slightly stronger than you.

And stop all this 3.5 vs 4 nonesense. We all know 4 is much better. But there's plenty of literature that shows its limits, especially around memorization. You also don't understand stochastic parrots, but in fairness, seems like most people don't. LLMs start from compression algorithms and they are that at their core. But this doesn't mean it is a copy machine despite the NYT article but it also doesn't mean it is a thinking machine like the baby AGI people. Truth is in between but we can't have a real conversation because hype primed us to just bundle people into two camps and make us all true believers. Just please stop gaslighting people when they say they have run into issues. The machine is sensitive to prompts, so that can be a key difference or sometimes they might just see mistakes you don't. It's not an oracle so don't treat it like one. And don't confuse this criticism as saying LLMs suck, because I use them almost every day and love them. I just don't get why we can't be realistic about their limits and can only believe they are a golden goose or pile of shit. It's, again, neither.

CamperBob2 · on Jan 2, 2024

You also don't understand stochastic parrots

You have a parrot that can paint original pictures, compose original songs and essays, and translate math into both English and program code?

I would like to buy your parrot. I'll keep it in my Chinese room. There used to be a guy in there, but he ran away screaming something about a basilisk.

godelski · on Jan 2, 2024

> You have a parrot that can paint original pictures, compose original songs and essays, and translate math into both English and program code?

Kinda, kinda, yes, and yes.

I think there's far less originality than most people think. But it's not surprising when your job isn't leading you to look at thousands of pictures a day. I have yet to see a generative model that isn't pulling heavily towards the training data and you might be noticing the memorization rates are getting higher. But yes, a stochastic parrot doesn't mean memorization, it is about generalization and the stability around the p-norm ball around the training data.

Btw, what's wrong with a stochastic parrot? They are absolutely fucking useful. I use them every day. Hell, I even use things that are complete memorizations and all compression every day. What's with everyone equating powerful statistical systems with uselessness. Anyone saying that they aren't extremely useful is pulling wool over their eyes (but the same is true for anyone claiming baby AGI).

I'd also appreciate it if you discussed in good faith. The snarkiness is not appreciated.

CamperBob2 · on Jan 2, 2024

I'm not being snarky! I genuinely feel I'm the one being gaslighted, by people telling me I shouldn't be utterly blown away by answers like the earlier example, or the one I just received:

https://i.imgur.com/JSWLFOi.png

I regularly get downvoted and criticized for suggesting this tool to other students, in defiance of what I can clearly see happening with my own eyes. I see a tool that, if developed further, will answer much deeper questions, including original ones, just as accurately and effectively. One that appears capable of taking humanity to the next level so fast it will make the monolith in 2001 look like an abacus by comparison.

Meanwhile, you tell me, "Don't suggest this to other students, it might hallucinate." Other people say, "Shut this down at once (or nerf it beyond any possibile utility), it might hurt somebody's feelings." Another contingent warns, "Shut this down at once, it might start a nuclear war." Still other people say, "Shut this down at once, it violates copyright law." The objections just get dumber from there, yet gain traction by the day.

There's never been a time when standing in the way of something like this was right. Why should I think it's time to do so now? (And yes, I acknowledge that you're not personally 'standing in the way', but it really bugs me when people who claim they aren't 'standing in the way' of the technology tell other people not to use it.)

I have yet to see a generative model that isn't pulling heavily towards the training data

When's the last time you saw a human mind that didn't work that way? (Or, for that matter, a parrot's mind.) The real truth behind the stochastic-parrot metaphor is that parrots, stochastic or otherwise, are nothing all that special, and neither are we. We're just better at using tools than the birds are, that's all.

Or at least we were up until now. But muh COPYRITE!!!11! ...

godelski · on Jan 2, 2024

> I genuinely feel I'm the one being gaslighted, by people telling me I shouldn't be utterly blown away by answers like the earlier example, or the one I just received:

I think people in my camp (which often are confused with the Gary Marcus camp), aren't saying you shouldn't be blown away. Those people wouldn't say this

> And don't confuse this criticism as saying LLMs suck, because I use them almost every day and love them. I just don't get why we can't be realistic about their limits and can only believe they are a golden goose or pile of shit. It's, again, neither.

Fwiw, I give those people an ever harder time. They deny utility that is quite apparent. They also have these silly contrived doomer arguments that don't make any sense, as if one day AGI is just going to unexpectedly appear out of nowhere and, like you suggest, somehow jump the airgap and get control of the world's nuclear weapons without anyone noticing. What an insane hypothesis that doesn't have anything substantial evidence and is entirely based on "but what if!" It is conspiratorial and a distraction from the real harm these systems can do which is far more subtle and not really an existential crisis (at least arguably in the same way, but let's not get into that). Some of these people are shills and some are useful idiots/true believers. You're right to not pay attention to them.

I'll also mention that I too am blown away. But you can be blown away and still have criticism and be wary of a thing too. The answer is quite impressive, without a doubt. I mean we are literally putting lightning into rocks and making them capable of doing math and speaking human languages. If you're not blown away by any single one of those things then it is simply a lack of imagination.

> When's the last time you saw a human mind that didn't work that way?

Quite frequently. Same with even my cat, and she's dumb as shit. Probably ran into too many walls while chasing toys but I think that's just a feedback loop lol. She's dumb as shit but I'm also absolutely blown away by her brilliance. It may be hard to see that both those can be true, but that's the true state. But I disagree that there isn't anything special about stochastic parrots, any animal, or humans. They are all mind mindbogglingly impressive, just our brains are designed to normalize things to not be overburdened by the computational load (which itself is impressive!).

You are absolutely right though that there's a ton of exploitation that humans do (referring to exploration vs exploitation). I said memorization is incredibly useful. But creativity is far more subtle. I should put it this way, chimps (very impressive creatures), are far better at memorization than most humans. But they are nowhere near as creative. Certainly some creativity is leveraging prior works for inspiration. But a subtle aspect of this is that often when this form is considered brilliant it crosses domains, which is something no ML seems to even have the capacity to perform. This can be hard to know though because unless you have domain knowledge you may not have heard about how people like Einstein was called a mathematician and not a physicist or how Nash was said to "just used topology". This type of lore is important if we're going to discuss actual intelligence but not important for tools or our every day lives. The devil is in the details when we care about details.

It can be really hard to understand these distinctions. You have to look REALLY close at details. One thing I'll mention is that I know I have looked at the datasets we use in our group far more than anyone else that I know. This is unsurprisingly an uncommon thing because it is boring to look at the raw data and investigating things like LAION takes herculean efforts (something I haven't even approached). But your example is actually remarkably relevant to this topic. You couldn't have done anything better! Because most people rely on measurements of distance like cosine similarity or L2 to determine duplicates or near duplicates. But ask GPT this (you should get the right answer): "How does the curse of dimensionality relate to distance measurements in higher dimensions? Are there any problems this creates?" Or ask it another one, which even the fact blew me away the first time I heard it despite being absolutely obvious after I took just a moment to think about it: "If I have a n dimensional space, where n is very large, what is the expected angle between any two random tensors? What is it as n approaches infinity?" I'm positive it will again give you the correct answer.

But you also have to realize that this is frequently written about and without a doubt in the training data. You can absolutely overfit models and have them be incredibly useful. But the difference is that this won't be generalization and will be brittle. For a long time GPT was not able to correctly answer "Which weighs more, a pound of feathers or a kilogram of bricks" because it was too sensitive to the expected answer (it'll work now btw). It still has problems with a variation of the corn, goose, fox river crossing puzzle if you change it to allow all items in the boat at once (at least when I checked a month ago). But this is not the actions of sentient creatures. Ones that can think and comprehend. You're going to have to think really hard about how you think and especially how you think really hard to get a good understanding of this. But it comes down to the reason why someone can be absolutely brilliant while shockingly idiotic. This is not the quip from iRobot with the "can you?" about art and symphonies. There is something deeper and truth be told, many animals do things for no good reason (one that can't be clearly defined by our perceived loss functions, which may accurately be called emergent behaviors). Every mammal also is able to run complex simulations in their minds, at incredibly low computational costs. Even the small rat will twitch its legs while it sleeps or your dog may bark, being unable to distinguish reality from a dream, just as you do. That is truly a world model. Something we aren't remotely close to in AI, but that's okay. Why would it not be okay?

But in some way you are being gaslit, but not by what I intended to say (but maybe from how you read it. Which I apologize, I am trying to work on communicating better, but it is hard when we have a diverse global audience with many different base assumptions and knowing which type of imputation I need to direct my message at). There are plenty of people with highly invested interest to sell you these tools as far more than they are. I've written a few comments before that what's going on is as if we made a chocolate factory. One that sells the best god damn chocolate you're ever tasted. But then they started selling the chocolate as a cure for cancer. At that point, it doesn't matter how good the chocolate is, people will feel disenfranchised. Some people are responding by saying that the chocolate tastes like shit while others are saying it cured their cancer. But neither of these are true. It's damn good chocolate, but it isn't going to cure cancer. (ML certainly will be a very useful tool for tackling cancer. That was not the intent of this analogy) I just think there's this fear that people have that if something isn't a literal gift from god then it is a pile of shit, and I don't get it. Nothing we have fits that description but we have done and created so many incredible things as humans and made such leaps and bounds with these half baked incomplete things. There is nothing wrong with just okay chocolate, but the chocolate we have is without a doubt, better than just okay.

Does that make more sense?

talentedcoin · on Jan 1, 2024

[flagged]

CamperBob2 · on Jan 1, 2024

Honestly, because the very first sentence of the preface is "This book aims to provide an introduction to the topic of deep learning algorithms." Really? LOL. If you're going to pitch 600 pages of dense mathematical notation as "introductory," you're going to have to expect some people to call BS.

What's interesting/unfortunate is that their Python code samples really are easy to follow and pedagogically useful to a beginner. I think a lot of people will be turned off by the text unnecessarily.

It should have been promoted as a rigorous reference textbook, which is what it is, and not any sort of tutorial or primer.

andrepd · on Jan 1, 2024

Obligatory hn comment on any math-related topic: "notation bad"

Please be more original.