Perhaps, if we encourage more people to use privacy preserving tools like Tor browser, signal, etc., then privacy for everyone will be normalized again. One can only hope
And if you're only interested in preserving just some Wiki pages, this browser extension with some automation on top will do the perfect job: https://github.com/gildas-lormeau/SingleFile
No affiliation, just a happy user :)
Great work!
Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists
Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.
Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.
I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)
Yes, TextSynth.com is a commercial service, see pricing [1]. If his code is faster than others' (I'd certainly believe it) then it's quite valuable, and he deserves to be able to monetise it. Edit: Also, OpenAI is slashing price for GPT-3 by 2-3x tomorrow because of "progress in making our models more efficient to run" [2].
Also, he was/is competing for the Hutter Prize with nncp, however he is outside the requirements for the prize: CPU-time, RAM, but most especially that submissions shouldn't require a modern CPU (with AVX-2) or a GPU. Otherwise he could have won it. I suspect it's actually that's the biggest reason he implemented libnc without GPU support initially. He has asked for the rules to be changed to allow AVX2 and I believe they eventually will be. So he won't give away the source for nncp yet, but will have to open source it to receive the prize.
I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have
On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci
Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out
I am surprised that there have been no responses to your question as this would be like the holly grail for people interested in privacy oriented and robust peer to peer messaging.
I wouldn't be able to add much know how to the technical aspects unfortunately.
Yup, getting a few of these each day and manually marking them as Spam (to no avail). Not sure how GMAIL's filter is missing them. A simple regex matcher would catch 99% of those. It seems like even the gmail registration process for the SPAM accounts has been automated?
Some other examples:
Чарльз Некрасов <qxazagesuf@gmail.com>
Фома Авдеев <tpewixicig@gmail.com>
Порфирий Угримов <solodqez@g...
I wish they tried to tackle StarCraft 1 BroodWar AI where the game is (arguably) even harder than SC 2. Besides, there's a healthy BroodWar AI community with some very strong AIs out there.
This is your typical modern "objective" journalism today. The only thing not clear is whoose financial and / or political interests they are trying to boost today. Follow the money / power, I guess?
The same situation in many developing countries around the world. The local taxi cartels have been using lobbying and outright violence to keep ride-sharing services away / banned.
Of all the natural predators mentioned above, only dragonflies make a real difference. It's night and day in the same area depending on whether there's a healthy population of dragonflies nearby.