Hacker Newsnew | past | comments | ask | show | jobs | submit | randlet's commentslogin

/r/epstein post from the creator:

https://reddit.com/r/Epstein/comments/1r3joqr/i_mapped_every...

-------

A week ago I posted about an open database I’ve been building to cross reference Epstein case material. That post did way better than I expected (568k views, 4.6k upvotes) and it hugged my server to death twice.

Since then I basically did nothing but ingest, clean, and index more data. The database is now big enough that “just read the docs” is not advice, it’s a cry for help. What it was last week

    ~6,000 documents
    1,708 flights
    2,700 emails
    1,438 people
What it is now

    1,522,060 documents (all DOJ releases we have access to so far), full text searchable
    1,708 flights (1997 to 2019) with manifests where available
    10,000+ emails indexed with threading
    1,350 people (cleaned: removed duplicates + nuked a bunch of false connections)
    638,000 docs run through redaction analysis
        ~1.8M individual redactions detected
        ~616k flagged by our tooling as “looks questionable, take a closer look”
        ~39,500 pages of text recovered from under black bars (you can see examples on the site)
    107,000 named entities pulled out via NLP (people, orgs, places, dates)
    1,530 audio/video transcripts
    4,300+ photos/media (raid photos, exhibits, property shots, government releases)
That’s not a typo: 1.5 million documents. If you search a phrase, it searches inside the actual pages (OCR where needed) and email bodies, not just titles.

So what changed, besides “everything is bigger”? 1) The redaction stuff is getting hard to ignore

I’m not saying “every redaction is evil.” Some of them obviously protect victims, minors, addresses, etc. But the patterns are weird, and the volume is insane.

I also worked with u/Sea_Doughnut_8853, who independently processed 519k PDFs with their own pipeline. That let us sanity check a lot of what we’re seeing across the corpus.

We’re flagging ~616k redactions as “potentially improper” based on patterns (context, repetition, surrounding text). That does not mean “definitely corrupt.” It means “this is the pile worth human eyes.”

We also recovered a lot of hidden text. If you want to judge it yourself, the doc pages show the redaction density and any recovered text we can reliably extract. 2) Entity extraction is the only way to deal with this scale

107,000 entities means you can stop playing whack a mole with PDFs. It’s still not “truth,” it’s just structure. But structure beats drowning. 3) This week’s real world developments are in there too

If you missed the news cycle, Congress has been pressuring DOJ about redactions, and Rep. Ro Khanna read six previously redacted names on the House floor:

    Leslie Wexner
    Salvatore Nuara
    Zurab Mikeladze
    Leonic Leonov
    Nicola Caputo
    Sultan Ahmed bin Sulayem
Important caveat: being named in a document is not proof of wrongdoing. People show up in emails, contact lists, forwarded threads, or because someone mentioned them.

Related:

    Reporting says Wexner’s name appeared in an internal FBI document as “co conspirator,” but he has not been charged.
    Maxwell invoked the Fifth in a House Oversight deposition and her lawyer floated testimony in exchange for clemency.
    House Oversight depositions are scheduled: Wexner (Feb 18), Richard Kahn (Feb 25), Darren Indyke (Mar 5), plus Hillary Clinton (Feb 26) and Bill Clinton (Feb 27).
All of those items are indexed, with the underlying documents linked where available. New tools since last week

    Full text search: search inside 1.5M documents, 28k OCR entries, and 10k emails
    AI research assistant: ask a question in plain English, get an answer with citations back to the source docs so you can verify it yourself
    Degrees of separation: shortest documented path between two people, with the supporting flights/docs shown at each hop
    Redaction analysis on every doc page: how heavy, what got flagged, what got recovered
    Investigation Dossiers (new today): community made evidence boards
        pin any person/doc/flight/email
        add notes
        upvotes + comments
        “community notes” style fact checks
        sorting like hot/new/top
        I put up 14 starter dossiers so it’s not an empty ghost town
What still bugs me

The government didn’t just withhold whole documents. In a lot of places, it looks like they blacked out specific names or transactions inside documents they did release. Maybe there are legit reasons for some of it. But at this volume, it needs scrutiny.

Also, the 2013 to 2019 passenger manifest gap is still a thing in the public record. Tons of flights, but not the corresponding names. The database

Everything is at EpsteinExposed.com. Free. No ads. No paywall. You can browse without logging in. Accounts are only for making dossiers and posting notes.

There’s also a community forum for collab research: https://board.epsteinexposed.com

If you find errors, call them out. If you want a specific thread turned into a dossier, say the name and I’ll help you get it set up. TL;DR

The database went from ~6k docs to 1.5M in a week. Full text searchable. We ran redaction analysis at scale, flagged a huge pile for human review, recovered a lot of hidden text, and the current Congress/DOJ redaction fight is now fully indexed in the same place. Update:

I went to sleep thinking this would be a normal update post and woke up to it hitting r/popular / r/all.

Thank you. Seriously.

In ~4 hours this hit ~750k views and people have already donated ~$800. That is wild, and it genuinely helps keep the lights on while I keep ingesting and cleaning data and everything goes toward making the site better!

A quick housekeeping thing because it needs to be said on posts like this:

Being named in a document is not proof of wrongdoing. People show up in emails, contact lists, forwarded threads, or because someone mentioned them.

Please don’t dox, harass, or post “I found their address” type stuff. If you want this taken seriously by journalists and agencies, it has to stay clean and source-based.

If you spot bad OCR, duplicates, broken links, or a false connection, call it out. That kind of boring cleanup work is how this gets stronger.

If you want to help, the best thing is still commenting and sharing. Second best is reporting errors or building a dossier on a specific thread so the research is organized and verifiable.

Also, small but important technical update: Semantic / Smart search is going live soon. Keyword search is great, but it misses anything that is phrased differently. Smart search uses a hybrid approach so you can search meaning, not just exact words. It’s already wired up, I’m generating the embeddings now and seeding them into the database next.


In your first link the narrator says he "doesn’t understand the physics of it" but there's really no physics involved (ignoring scatter). It’s just a consequence of the math. It’s relatively easy to understand if you think of it in terms of the surface of a sphere. There is a fixed amount of light coming from a point source, and as the light travels outward you can think of it as being spread over the surface of a sphere. Since the surface area of a sphere is 4pir^2, if you double the radius the area quadruples, and therefore the light intensity at any point on the sphere drops by a factor of four.

edit: And now after rtfm I see there's a nice demo of this!


I have the same and really like them other than the BTE still require an induction loop necklace for Bluetooh.


> I understand the self-interested desire for the ultra wealthy to have lower taxes on an individual level,

I don't. It seems like mental illness to me


What is so hard about understanding their desire? They want to keep more money for themselves so they can buy more yachts, invest in more things, whatever. This is just how humans are.

It's up to policy makers (i.e. government when it comes to taxation) to structure the system to mitigate the inherent self-interest that people have. Obviously that's easier said than done when the ultra-rich buy off the politicians, but human nature just is what it is.


The thing is they have so much money they can't buy enough. Even if all they do is consume, consume, consume they will still be trending upwards. In effect, every yacht has a negative price tag.


It bears the same hallmarks as any other addict: the next hit has to be even bigger than the last, and everyday enjoyments in life are practically invisible to them. Their drug of choice may be different, but the outcome on their life, relationships and society is largely the same.


> This is just how *some* humans are.


To clarify, I don't mean all humans want to keep all money for themselves even when filthy rich, I just mean that all humans are self-interested. The extent to which that governs our behavior of course differs and I agree there are people who willingly sacrifice to help others. A bit more in another comment: https://news.ycombinator.com/item?id=45683652


To clarify a bit...

I meant I understand it in the sense that to get that wealthy in the first place you have to have a special kind of self-interested sociopathy, so in that regard I understand their desire. Not that I agree with it.


Every human is self-interested. Fortunately there are a lot of good people out there who try to temper that instinct, but remembering that everybody by default has a "what's in it for me" line of thinking (even if consciously attempting to subjugate it) is important for understanding as much as we can about how the world works. I don't think a sleasy real estate agent is behaving any differently (by jacking up fees, putting people in properties that may not be in their best interest, strong-arming FSBOs into listing with them) than the billionaire is, it's just the scale of their blast radius that is different. That's not to equate the two, because obviously scale matters, but the underlying motiviation is largely the same.


Gotcha. I should have realized you meant that from the rest of your post.


Kinesis Freestyle Pro? I've got the non mechanical version (mechanical wasn't available when I purchased) and it's held up for many years. It's a great keyboard IMO.


The Freestyle Pro is almost a good keyboard. The Esc and function keys are all offset to the left by one key compared to a standard layout, which drove me nuts. I have a Freestyle Edge RGB now, which I like much better. (Though I replaced the wrist rests with some from Goldtouch.)


I really like my Kinesis Freestyle Edge with the tenting kit. I’ve been using it for around 2 years and no complaints.


Thank you, I'll take a look.


Not to mention that the rear brake comes into play as applying the rear brake will transfer weight to the front allowing you to apply your front brakes harder.


Not in my experience (20yrs). If it's all-in-or-we're dead, you either do a stoppie and hope, or do a deliberate lowside and also hope someone will patch you up once they dislodge your parts from wherever you end up in.

In either case it's front brake. A bit of tilt for the second case. The rear brake is not needed at all.

If it's just a overspeed corner, you try to slow down gently, while maintaining both wheels on the trajectory. So just a little play with throttle and just a little front brake so that the bike stays balanced so to say. No rear brake at all because dropping the throttle a bit is all that's needed for the rear wheel.

If that is not enough, you're not going to make that corner, you have had too much speed coming in, and you will pay for that right now by crashing into something.

I learned this in 1990s when I first started riding.

Those are the basics.


With all due respect it sounds like you could use a refresher on motorcycle physics.


2% is quite low. Most of the FIRE community would consider even 3% quite conservative.


3 million invested means you can withdraw $100000 annual income with virtually no risk of ever going broke. That is financial independence. $10MM is way too high a bar.


Talk to your Dr obviously, but you may just be running too fast. There's lots of Couch to 5k programs that start off with 30s run intervals with walking in between.

With running you need to play the long game and slowly build up your pace, total mileage, and number of weekly training days. It's hard to be patient, but it seems to be the secret to minimizing training injuries.


I'm not a run streaker, but am an avid runner and if you "only" need to run a mile you really only need to find 15min in a day to keep your streak alive.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: