Ok, crazy tangent; Where agents will potentially become *extremely* useful/dysto...

DariusKocar · on Feb 21, 2024

I'm working on this! https://www.perfectmemory.ai/

It's encrypted (on top of Bitlocker) and local. There's all this competition who makes the best, most articulate LLM. But the truth is that off-the-shelf 7B models can put sentences together with no problem. It's the context they're missing.

crooked-v · on Feb 21, 2024

I feel like the storage requirements are really going to be these issue for these apps/services that run on "take screenshots and OCR them" functionality with LLMs. If you're using something like this a huge part of the value proposition is in the long term, but until something has a more efficient way to function, even a 1-year history is impractical for a lot of people.

For example, consider the classic situation of accidentally giving someone the same Christmas that you did a few years back. A sufficiently powerful personal LLM that 'remembers everything' could absolutely help with that (maybe even give you a nice table of the gifts you've purchased online, who they were for, and what categories of items would complement a previous gift), but only if it can practically store that memory for a multi-year time period.

DariusKocar · on Feb 21, 2024

It's not that bad. With Perfect Memory AI I see ~9GB a month. That's 108 GB/year. HDD/SSDs are getting bigger than that every year. The storage also varies by what you do, your workflow and display resolution. Here's an article I wrote on my finding of storage requirements. https://www.perfectmemory.ai/support/storage-resources/stora...

And if you want to use the data for LLM only, then you don't need to store the screenshots at all. Then it's ~ 15MB a month

jascination · on Feb 21, 2024

> That's 108 GB/year. HDD/SSDs are getting bigger than that every year.

Cries in MacBook Pro

technofiend · on Feb 21, 2024

Outboard TB 3/4 storage only seems expensive until you price it against Apple's native storage. Is it slower? Of course! Is it fast enough? Probably.

darreninthenet · on Feb 21, 2024

I recently moved my macOS installation to an external Thunderbolt drive - it's faster than the internal SSD.

technofiend · on Feb 22, 2024

Considering storage is a wasting asset and what Apple charges, this makes perfect sense to me.

darreninthenet · on Feb 22, 2024

The funny thing is Apple even have a support article on how to do this (and actually say in it "may improve your performance") I literally followed it step by step and it was very easy and had no issues.

ayewo · on Feb 22, 2024

Can you share the Thunderbolt drive you got?

darreninthenet · on Feb 22, 2024

https://glyphtech.com/products/atom-pro?variant=321211999191...

Shipped to the UK for me added a bit to the overall price with shipping and import duty but it was still better value for money and hugely reliable brand than anything I could have bought domestically.

pauby · on Feb 22, 2024

It's Windows only so it won't run on your Mac anyway :-)

lillecarl · on Feb 21, 2024

PerfectMemory is only available on Windows at the moment.

kristofferR · on Feb 22, 2024

https://Rewind.ai is the macOS equivalent

glenneroo · on Feb 22, 2024

Except that Rewind uses chatGPT whereas this runs entirely locally. I would like to note though that Anonymous Analytics are enabled as well as auto-updates, both of which I disabled for privacy reasons. Encryption is also disabled by default. I just blocked everything with my firewall for peace of mind :)

dr_kiszonka · on Feb 21, 2024

Does storage use scale linearly with the number of connected monitors (assuming each monitor uses the same resolution)?

DariusKocar · on Feb 22, 2024

Most screenshots are of the application window in the foreground, so unless your application spans all monitors, there is no significant overhead with multiple monitors. DPI on the other hand has a significant impact. The text is finer, taking more pixels...

behnamoh · on Feb 22, 2024

Why should DPI matter if the app is taking screenshots?

rezonant · on Feb 22, 2024

Because screenshots are in pixels, not inches.

pseudosavant · on Feb 21, 2024

Is the 15mb basically embeddings from the video screenshots? What would it recall if there isn't the screenshots saved?

rlt · on Feb 22, 2024

I’m not sure if the above product does this, but you could use a multimodal model to extract descriptions of the screenshots and store those in a vector database with embeddings.

dav43 · on Feb 22, 2024

I set up two years ago a cron to screenshot every minute.

Just did the second phase of using ocrmac (vision kit cli on GitHub) that extracts text and dumps it in a SQLite with FTS5.

It’s simplistic but does the job for now.

I looked at reducing storage requirements by using image magik to only store the difference between images - some 5 min sequence are essentially the same screen - but let that one go.

sdenton4 · on Feb 22, 2024

/using image magik to only store the difference between images/

Well, that's basically how video codecs work... So might as well just find some codec params which work well with screen capture, and use an existing encoder.

xhrpost · on Feb 22, 2024

Thanks for sharing. Curious, what main value adds have you gotten out of this data?

dav43 · on Feb 26, 2024

I’m loose with my memory and I’d often recall reading or looking at something and could never find it in safari history etc. with info spread across WhatsApp emails files web history is helped nudge me in the right direction here and there. Saved me once when i made an online purchase, never got an email confirmation as well.

rlt · on Feb 22, 2024

I think ultimately you’d want it to summarize that down to something like:

“Purchased socks from Amazon for $10 on 12/4/2024 at 5:04PM, shipped to Mom, 1600 Pennsylvania Av NW, Washington DC 20500, order number 1463355337

Probably stored in a vector DB for RAG.

pennomi · on Feb 22, 2024

Maybe. Until we find there’s a better way to encode the information and need the unfiltered, original context so it can be used with that new method.

jasonjayr · on Feb 22, 2024

This is where Microsoft (and Apple) has a leg up -- they can hook the UI at the draw level and parse the interface far more reliably + efficently than screenshot + OCR.

joebob42 · on Feb 22, 2024

Google too, for all practical purposes, since presumably this is mostly just watching you use chrome 90% of the time.

behnamoh · on Feb 22, 2024

All the more reason not to use Chrome...

xattt · on Feb 21, 2024

This reminds me of how Sherlock, Spotlight and its iterations came to be. It was very resource intensive to index everything and keep a live db, until it was not.

smusamashah · on Feb 21, 2024

Your website and blog are very low on details on how this is working. Downloading and installing an mai directly feels unsafe imo. Especially when I don't know how this software is working. Is it recording a video, performing OCR continuously, taking just screenshots

No mention of using any LLMs in there at all which is how you are presenting it in your comment here.

DariusKocar · on Feb 21, 2024

Feedback taken. I'll add more details on how this works for us technical people. LLM integration is in progress and coming soon.

Any idea what would make you feel safe? 3rd party verification? I had it verified and published by the Microsoft Store. I feel eventually it all comes down to me being a decent person.

rocho · on Feb 22, 2024

I'd consider installing it if it had:

* In-depth technical explanation with architecture diagrams

* Open-source and self-hosted version

Also I didn't understand if it talks to a remote server or not. Because that's a big blocker for me.

itsanaccount · on Feb 22, 2024

welp. this pretty much convinces me that its time I get out of tech. lean into the tradework I do in my spare time.

because I'm sure you and people like you will succeed in your endeavors, naively thinking you're doing good. and you or someone like you will sell out, the most ruthless investor will take what you've built and use it as one more cludgel of power to beat the rest of us with.

sconely · on Feb 22, 2024

If you want to help, use your knowledge to help shape policy. Because it is coming/already happening, and it will shape your life even if you are just living a simple life. I guarantee you that your city and state governments are passing legislation to incorporate AI to affect your life if they can be sold on it in the name of "good".

itsanaccount · on Feb 22, 2024

I live next to the Amish, trust me my township isn't passing anything related to AI.

For a reality check, name one instance of policy that has stopped the amoral march of tech being a tool of power to the hands of the few? Last one I can name is when they broke up Ma Bell. Now of course you can pick Verizon or AT&T, so that worked. /s

And that was 42 years ago.

milesskorpen · on Feb 21, 2024

Basically looks like rewind.ai but for the PC?

cyrux004 · on Feb 21, 2024

exactly. the UI is shockingly similar

mdrzn · on Feb 23, 2024

I installed it and kept it open for a full day but apparently it hasn't "saved" anything, and even if I open a Wiki page and a few minutes later search for that page, it returns nothing. Tried reading the Support FAQs on the website to no avail. Screen recording is on.

arthurcolle · on Feb 22, 2024

This looks cool, I hope you support macOS at some point in the future

m-GDEV · on Feb 21, 2024

Any plan to implement this on macOS or Linux?

Zetaphor · on Feb 22, 2024

I got 90% of this built on Linux (around KDE Wayland) before other interests/priorities took over:

https://github.com/Zetaphor/screendiary/

ebri · on Feb 22, 2024

This seems very very interesting. I'm still learning python so probably can't build on this. But like a cheap mans' version of this would be to take a screenshot every couple of minutes, OCR it and send to it gpt for some kind of processing (or not, just keep it as a log). Right? Or am I missing something?

Zetaphor · on Feb 22, 2024

Yes, that's exactly what's happening here, minus the sending it off to a third-party.

I didn't see the benefit when the OCR content is fully searchable, in addition to not wanting to pay OpenAI to spy on me.

kristofferR · on Feb 22, 2024

macOS: https://www.rewind.ai/

wingerlang · on Feb 22, 2024

macOS: https://screenmemory.app/

This is my application, it does not have AI running on top.

hodanli · on Feb 22, 2024

statistics about the usage would be cool

Animats · on Feb 21, 2024

> Imagine it just watching you coding for months, planning stuff, researching things, it could potentially give you personal and professional advice from deep knowledge about you.

And then announcing "I can do your job now. You're fired."

ghxst · on Feb 21, 2024

That's why we would want it to run locally! Think about a fully personalized model that can work out some simple tasks / code while you're going out for groceries, or potentially more complex tasks while you're sleeping.

galaxyLogic · on Feb 22, 2024

"AI Companion" is a bit like spouse. You are married to it in the long run, unless you decide to divorce it. Definitely TRUST is the basis of marrage, and it should be the same for AI models.

As in human marriage, there should be a law that said your AI-companion cannot be compelled to testify against you :-)

huytersd · on Feb 22, 2024

But unlike a spouse you can reset it back to an earlier state you preferred.

galaxyLogic · on Feb 23, 2024

That's a noteworthy difference. Maybe AI only becomes truly "human" when it can't be reset. Maybe only then we can truly trust it - because it has the capability to betray us and yet it won't. (if it does then we don't trust it any more)

underdeserver · on Feb 21, 2024

It's local to your employer's computer.

ssl-3 · on Feb 21, 2024

It can be.

It can also be local to my own computer. People do write software while they're away from work.

EGreg · on Feb 22, 2024

How quaint.

You humans think that the AI will have someone in charge of it. Look, that's a thin layer that can be eliminated quickly. It's like when you build a tool that automates the work of, say, law firms but you don't want law firms getting mad that you're giving it away to their clients, so you give it to the law firms and now they secretly use the automating software. But it's only a matter of time before the humans are eliminated from the loop:

https://www.youtube.com/watch?v=SrIf0oYTtaI

The employee will be eliminated. But also the employer. The whole thing can be run by AI agents, which then build and train other AI agents. Then swarms of agents can carry out tasks over long periods of time, distributed, while earning reputation points etc.

This movie btw is highly recommended, I just can't find it anywhere anymore due to copyright. If you think about it, it's just a bunch of guys talking in rooms for most of the movie, but it's a lot more suspenseful than Terminator: https://www.youtube.com/watch?v=kyOEwiQhzMI

ssl-3 · on Feb 22, 2024

We've all seen the historical documents. We know how this will all end up, and that the end result is simply inevitable.

And since that has to be the case, we might as well find fun and profit wherever we can -- while we still can.

If that means that my desktop robot is keeping tabs on me while I write this, then so be it as long as I get some short-term gain. (There can be no long-term gain.)

albumen · on Feb 21, 2024

Have it running on your personal comp, monitoring a screen-share from your work comp. (But that would probably breach your employment contract re saving work on personal machines.)

eru · on Feb 22, 2024

You could point your local computer's webcam at the work computer.

It probably breaks the spirit of the employment contract just as hard, but it's essentially undetectable for the work computer.

pbhjpbhj · on Feb 22, 2024

Is there an app that recreates documents this way? Presumably a ML model that works on images and text could take several overlapping images of a document and piece then together as a reproduction of that document?

Kinda like making a 3D CAD model from a few images at different angles, but for documents?

eru · on Feb 22, 2024

Not exactly the same, but you might like https://arstechnica.com/gaming/2024/02/f-zero-courses-from-a...

mostlysimilar · on Feb 21, 2024

Corporations would absolutely force this until it could do your job and then fire you the second they could.

bugbuddy · on Feb 22, 2024

I heard somewhere that dystopia is fundamentally unstable. Maybe they should test that question.

ChrisClark · on Feb 21, 2024

That sounds a lot like Learning To Be Me, by Greg Egan. Just not quite as advanced, or inside your head.

_vk_ · on Feb 22, 2024

For anyone unfamiliar with this story:

https://philosophy.williams.edu/files/Egan-Learning-to-Be-Me...

brailsafe · on Feb 22, 2024

Jokes on it, already unemployed

slg · on Feb 21, 2024

>Isolated, encrypted and local of course.

And what is the likelihood of that "of course" portion actually happening? What is the business model that makes that route more profitable compared to the current model all the leaders in this tech are using in which they control everything?

worldsayshi · on Feb 22, 2024

Maybe it doesn't have to be more profitable. Even if open source models would always be one step behind the closed ones that doesn't mean they won't be good enough.

shostack · on Feb 22, 2024

This. I want an AI assistant like in the movie Her. But when I think about the realities of data access that requires, and my limited trust in companies that are playing in this space to do so in a way that respects my privacy, I realize I won't get it until it is economically viable to have an open source option run on my own hardware.

fragmede · on Feb 22, 2024

Given that http://rewind.ai is doing just that, the odds are pretty good!

slg · on Feb 22, 2024

No they aren't. Rewind uses ChatGPT so data is sent off your local device[1].

I understand the actual screen recordings don't leave your machine, but that just creates a catch-22 of what does. Either the text based summaries of those recordings are thorough enough to still be worthy of privacy or the actual answers you get won't actually include many details from those recordings.

[1] - https://help.rewind.ai/en/articles/7791703-ask-rewind-s-priv...

fragmede · on Feb 22, 2024

ah yeah fair point. it's the screen recordings I'm worried about leaving my computer

bonoboTP · on Feb 22, 2024

It doesn't even have to coach you at your job, simply a LLM-powered fuzzy retrieval would be great. Where did I put that file three weeks ago? What was that trick that I had to do to fix that annoying OS config issue? I recall seeing a tweet about a paper that did xyz about half a year ago, what was it called again?

Of course taking notes and bookmarking things is possible, but you can't include everything and it takes a lot of discipline to keep things neatly organized.

So we take it for granted that every once in a while we forget things, and can't find them again with web searching.

But with the new LLMs and multimodal models, in principle this can be solved. Just describe the thing you want to recall in vague natural language and the model will find it.

And this kind of retrieval is just one thing. But if it works well, we may also grow to rely on it a lot. Just as many who use GPS in the car never really learn the mental map of the city layout and can't drive around without it. Yeah, I know that some ancient philosopher derided the invention of books the same way (will make our memory lazy). But it can make us less capable by ourselves, but much more capable when augmented with this kind of near-perfect memory.

Nition · on Feb 22, 2024

Eventually someone will realise that it'd also be great for telling you where you left your keys, if it'd film everything you see instead of just your screen.

goatlover · on Feb 22, 2024

I simply am not going to have my entire life filmed by an form of technology, I don't care what the advantages are. There's a limit to the level of dystopian dependent uses of these technologies I'm going to put up with. I sincerely hope the majority of the human race feels the same way.

bonoboTP · on Feb 22, 2024

This is not how most people think. If it's convenient and has useful features, it will spread. Soon enough it will be expected that you use it, just like it's expected today to have a smartphone and install apps to participate in events, or to use zoom etc.

By the way, Meta is already working to realize such a device. Like Alexa on steroids, but it also sees what you see and remembers it all. It's not speculation, it is being built.

https://twitter.com/_akhaliq/status/1760502294016036986

roywiggins · on Feb 22, 2024

People already fill their homes with nanny cams. Very soon someone will hook those up to LLMs so you can ask it what happened at home while you were gone.

prmoustache · on Feb 22, 2024

I think that is mostly a regional USA thing.

What they fill their homes with are definitely microphones, with the google assistant and amazon echos.

alex_suzuki · on Feb 22, 2024

The Black Mirror episode „The Entire History of You“ comes to mind. It’s quite dystopian.

bonoboTP · on Feb 22, 2024

Also, just in case someone thinks this is an exaggeration, Meta is actively working to realize this with the Aria glasses. They just released another large dataset with such daily activities.

https://twitter.com/_akhaliq/status/1760502294016036986

Privacy concerns will not stop it, just like it didn't stop social media (and other) tracking. People have been taught the mantra that "if you have nothing to hide, ...", and everyone accepts it.

bonoboTP · on Feb 22, 2024

True but that's still a bit further away. The screen contents (when mostly working with text) is a much better constrained and cleaner environment compared to camera feeds from real life. And most of the fleeting info we tend to forget appears on screens anyway.

cush · on Feb 21, 2024

https://www.rewind.ai/ seems to be exactly this

behat · on Feb 21, 2024

Heh. Built a macOS app that does something like this a while ago - https://github.com/bharathpbhat/EssentialApp

Back then, I used on device OCR and then sent the text to gpt. I’ve been wanting to re-do this with local LLMs

zoogeny · on Feb 21, 2024

Why watch your screen when you could feed in video from a wearable pair of glasses like those Instagram Ray Bans. And why stop at video when you could have it record and learn from a mic that is always on. And you might as well throw in a feed of your GPS location and biometrics from your smart watch.

When you consider it, we aren't very far away from that at all.

pier25 · on Feb 21, 2024

> encrypted and local of course

Only for people who'd pay for that.

Free users would become the product.

dpkirchner · on Feb 22, 2024

I noticed you code this way, may i recommend a Lenovo Thinkpad with an Intel Xeon processor? You're sure to "wish everything was a Lenovo."

graphe · on Feb 22, 2024

Certainly! Here is a list of great thinkpads.

The x230 is a popular and interesting thinkpad with a powerful i5 processor suitable for today’s needs.

The T60 can also suit your needs and is one of the last IBM thinkpads. It featured the latest Intel mobile processor at the time of its release.

If you want the most powerful thinkpad the T440p is sure to suit you perfectly without leaving your morals behind.

fillskills · on Feb 21, 2024

Unless its open sourced :)

troupo · on Feb 21, 2024

In modern world open code often doesn't mean much. E.g. Chrome is opensourced. And yet no one really contributes to it or has any say over the direction its going: https://twitter.com/RickByers/status/1715568535731155100

stavros · on Feb 21, 2024

Open source isn't meant to give everyone control over a specific project. It's meant to make it so, if you don't like the project, you can fork it and chart your own direction for it.

freedomben · on Feb 22, 2024

exactly. open source doesn't mean you can tell other people what to do with their time and/or money. it does mean that you can use your own time and/or money to make it what you want it to be. The fact that there are active forks of Chromium is a pretty good indicator that it is working

userbinator · on Feb 22, 2024

It's meant to make it so, if you don't like the project, you can fork it and chart your own direction for it.

...accompanied by the wrath of countless others discouraging you from trying to fork if you even so much as give slight indications of wanting to do so, and then when you do, they continue to spread FUD about how your fork is inferior.

I've seen plenty of discussions here and elsewhere where the one who suggests forking got a virtual beating for it.

stavros · on Feb 22, 2024

Is it up to the open source licenses to police the opinions people have?

Buttons840 · on Feb 21, 2024

A browser is an extreme case, one of the most difficult types of software and full of stupid minutia and legacy crap. Nobody want to volunteer for that.

Machine learning is fun and ultimately it doesn't require a lot of code. If people have the compute, open source maintainers will have the interest to exploit it due to the high coolness-to-work-required ratio.

pier25 · on Feb 21, 2024

Chrome is not open sourced, Chromium is.

troupo · on Feb 22, 2024

A distinction without meaning

charcircuit · on Feb 21, 2024

The graph seems to be that browsers are able to focus more resources towards improving the browser than improving the browser engine to meet their needs. If the browser engine already has what they need there is less of need for companies to dig deep into the internals. It's a sign of maturity and also a sign that open source work is properly being funded.

DariusKocar · on Feb 21, 2024

One needs to follow the money to find the true direction. I think the ideal setup is that such a product is owned by a public figure/org who has no vested interest in making money or using it in a way.

searchableguy · on Feb 22, 2024

I pre-ordered the rewind pendant. It will listen 24/7 and help you figure out what happened.

I bet meta is thinking of doing this with quest once the battery life improves.

https://rewind.ai/pendant

1shooner · on Feb 22, 2024

This service says it's local and privacy-first, but it sends to OpenAI?

>Our service, Ask Rewind, integrates OpenAI’s ChatGPT, allowing for the extraction of key information from your device’s audio and video files to produce relevant and personalized outputs in response to your inputs and questions.

vineyardmike · on Feb 22, 2024

I'm not related to the project, but I think they mean that it stores the audio locally, and can transcribe locally. They (plan to) use GPT for summarization. They said you should be able to access the recording locally too.

The rest of the company has info on their other free/paid offerings and the split is pretty closely "what do we need to pay for an API to do vs do locally".

Again, I'm not associated with them, but that was my expectation after looking at it.

ramenbytes · on Feb 22, 2024

Black Mirror strikes again.

frizlab · on Feb 21, 2024

I would hate that so much.

FirmwareBurner · on Feb 21, 2024

IKR, Who wouldn't want another Clippy constantly nagging you, but this time with a higher IQ and more intimate knowledge of you? /s

kreeben · on Feb 21, 2024

Clippy, definition: bot created by mega corp.

Clippy + high IQ: red flag, right here

Clippy + high IQ + intimate knowledge of you: do you seriously want that? Why?

fragmede · on Feb 22, 2024

Life's never gotten to you that you've just wanted a bit of help sometime?

mixmastamyk · on Feb 21, 2024

"It looks like you're writing a suicide note... care for any help?"

https://www.reddit.com/r/memes/comments/bb1jq9/clippy_is_qui...

system2 · on Feb 21, 2024

If 7 second video consumed 1k token, I'd assume the budget must be insane to process such prompt.

MyFirstSass · on Feb 21, 2024

Yeah not feasible with todays methods and rag / lora shenanigans, but the way the field is moving i wouldn't be surprised if new decoder paradigms made it possible.

Saw this yesterday, 1M context window but haven't had any time to look into it, just an example new developments happening every week:

https://www.reddit.com/r/LocalLLaMA/comments/1as36v9/anyone_...

Invictus0 · on Feb 21, 2024

That's a 7 second video from an HD camera. When recording a screen, you only really need to consider whats changing on the screen.

nostrebored · on Feb 21, 2024

That’s not true. What content is important context on the screen might change dependent on the new changes.

MetalGuru · on Feb 22, 2024

The point is you can do massive compression. It’s more like a sequence of sparse images than video.

yazaddaruvala · on Feb 21, 2024

Unlikely to be a prompt. It would need to be some form of fine tuning like LORA.

philips · on Feb 21, 2024

I have a friend building something like that at https://perfectmemory.ai

chancemehmu · on Feb 21, 2024

That's impel - https://tryimpel.com

crooked-v · on Feb 21, 2024

The "smart tasks" functionality looks like the most compelling part of that to me, but it would have to be REALLY reliable for me to use it. 50% reliability in capturing tasks is about the same as 0% reliability when it comes to actually being a useful part of anything professional.

dmix · on Feb 21, 2024

The hard part of any smart automation system, and probably 95% of the UX is timing and managing the prompts/notifications you get.

It can do as much as it wants in the background turning that into timely and non-intrusive actionable behaviours is extremely challenging.

I spent a long time thinking about a global notification consumption system that would parse all desktop, mobile, email, slack, web app, etc notifications into a single stream and then intelligently organizes it with adaptive timing and focus streams.

The cross platform nature made it infeasible but it was a fun thought experiment because we often get repeated notifications on every different device/interface and most of the time we just zone it out cuz it’s overload.

Adding a new nanny to your desktop is just going to pile it on even more so you have to be careful.

dweekly · on Feb 21, 2024

There's limited information on the site - are you using them or affiliated with them? What's your take? Does it work well?

chancemehmu · on Feb 21, 2024

I have been using their beta for the past two weeks and it's pretty good. Like I am watching youtube videos and it just pops up automatically.

I don't know if it's public yet, but they sent me this video with the invite: https://youtu.be/dXvhGwj4yGo

isaac-sway · on Feb 22, 2024

I'd be very keen to beta test as well. If you or anyone else has an invite code, please do get in touch.

oconnor663 · on Feb 21, 2024

A version of this that seems both easier and less weird would be an AI that listens to you all the time when you're learning a foreign language. Imagine how much faster you could learn, and how much more native you could ultimately get, if you had something that could buzz your watch whenever you said something wrong. And of course you'd calibrate it to understand what level you're at and not spam you constantly. I would love to have something like that, assuming it was voluntary...

lucubratory · on Feb 21, 2024

I think even aside from the more outlandish ideas like that one, just having a fluent native speaker to talk to as much as you want would be incredibly valuable. Even more valuable if they are smart/educated enough to act as a language teacher. High-quality LLMs with a conversational interface capable of seamless language switching are an absolute killer app for language learning.

A use that seems scientifically possible but technically difficult would be to have an LLM help you engage in essentially immersion learning. Set up something like a pihole, but instead of cutting out ads it intercepts all the content you're consuming (webpages, text, video, images) and translates it to the language you're learning. The idea would be that you don't have to go out and find whole new sources of language to set yourself with a different language's information ecosystem, you can just press a button and convert your current information ecosystem to the language you want to learn. If something like that could be implemented it would be incredibly valuable.

RogerL · on Feb 22, 2024

Don't we have that? My browser offers to translate pages that aren't in English, youtube creates auto generated closed captions, which you can then have it translate to English (or whatever), we have text to speech models for the major languages if you want to hear it verbally (I have no idea if the youtube CC are accessible via an api, but it is certainly something google could do if they wanted to).

I'll probably get pushback on the quality of things like auto-generated subtitles, but I did the above to watch and understand a long interview I was interested in but don't possess skill in the language they were using. That was to turn the content into something I already know, but I could do the reverse and turn English content into French or whatever I'm trying to learn.

lucubratory · on Feb 22, 2024

The point is to achieve immersion learning. Changing the language of your subtitles on some of the content you watch (YouTube + webpages isn't everything the average person reads) isn't immersion learning, you're often still receiving the information in your native language which will impede learning. As well, because the overwhelming majority of language you read will still be in your native language you're switching back and forth all the time, which also impedes learning. There's a reason that immersion learning specifically is so effective, and one thing AI could achieve is making it actually feasible to achieve without having to move countries or change all of your information sources.

lawlessone · on Feb 21, 2024

>assuming it was voluntary...

Imagine if it was wrong about something. But every time you tried to submit the bug report it disables your arms via Nueralink.

Solvency · on Feb 22, 2024

I love how in a sea of navel-gazing ideas, this one is randomly being downvoted to oblivion. Does HN hate learning new languages or something?

phatfish · on Feb 22, 2024

Learning and a "personal tutor" seem like a sweet spot for generative AI. It has the ability to give a conversational representation to the sum total of human knowledge so far.

When it can gently nag you via a phone app to study and have a fake zoom call with you to be more engaging it feels like that could get much better results than the current online courses.

nebula8804 · on Feb 22, 2024

It would be dangerously valuable to bad actors but what if it is available to everyone? Then it may become less dangerous and more of a tool to help people improve their lives. The bad actor can use the tool to arbitrage but just remove that opportunity to arbitrage and there you go!

CamperBob2 · on Feb 21, 2024

I liked this idea better in THX-1138.

MyFirstSass · on Feb 21, 2024

One of the movies i've had on my watch list for far too long, thanks for reminding me.

But yeah, dystopia is right down the same road we're all going right now.

mdanger007 · on Feb 21, 2024

Reading The Four by Scott Galloway, Apple, Facebook, Google, and Amazon were dominating the market 7 years ago generating 2.3 trillion in wealth. They're worth double that now.

The Four, especially with its AI, is going to control the market in ways that will have a deep impact on government and society.

MyFirstSass · on Feb 21, 2024

Yeah, that's one of the developments i'm unable to spin positively.

As technological society advances the threshold to enter the market with anything not completely laughable becomes exponentially harder, only consolidating old money or the already established right?

What i found so amazing about the early internet, or even just the internet 2.0 was the possibility to create a platform/marketplace/magazine or whatever, and actually have it take off and get a little of the shared growth.

But now it seems all growth has become centralised to a few apps and marketplaces and the barrier to entry is getting harder by the hour.

Ie. being an entrepreneur is harder now because of tech and market consolidation. But potentially mirrored in previous eras like the industrialisation - i'm just not sure we'll get another "reset" like that to allow new players.

Please someone explain how this is wrong and there's still hope for the tech entrepreneurs / sideprojects!

jjjjj55555 · on Feb 22, 2024

Seems like the big tech cos are going to build the underlying infrastructure but you'll still be able to identify those small market opportunities and develop and sell solutions to fit them.

chamomeal · on Feb 21, 2024

Not crazy! I listened to a software engineering daily episode about pieces.app. Right now it’s some dev productivity tool or something, but in the interview the guy laid out a crazy vision that sounds like what you’re talking about.

He was talking about eventually having an agent that watches your screen and remembers what you do across all apps, and can store it and share it with you team.

So you could say “how does my teammate run staging builds?” or “what happened to the documentation on feature x that we never finished building”, and it’ll just know.

Obviously that’s far away, and it was just the ramblings of excited founder, but it’s fun to think about. Not sure if I hate it or love it lol

jerbear4328 · on Feb 21, 2024

Being able to ask about stuff other people do seems like it could be ripe with privacy issues, honestly. Even if the model was limited to only recording work stuff, I don't think I would want that. Imagine "how often does my coworker browse to HN during work" or "list examples of dumb mistakes my coworkers have made" for some not-so-bad examples.

bonoboTP · on Feb 22, 2024

Even later it will be ingesting camera feeds from your AR glasses and listening in on your conversations, so you can remember what you agreed on. Just like automated meeting notes with Zoom which already exists, but it will be for real life 24/7.

Speech-to-text works. OCR works. LLMs are quite good at getting the semantics of the extracted text. Image understanding is pretty good too already. Just with the things that already exist right now, you can go most of the way.

And the CCTV cameras will also all be processed through something like it.

az226 · on Feb 21, 2024

Rewind.ai

evaneykelen · on Feb 21, 2024

I have tried Rewind and found it very disappointing. Transcripts were of very poor quality and the screen capture timeline proved useless to me.

wingerlang · on Feb 22, 2024

If I may do some advertising, I specifically disliked the timeline in Rewind.ai so much so that I built my own application https://screenmemory.app. In fact the timeline is what I work on the most and have the most plans for.

Falimonda · on Feb 21, 2024

If it wasn't for the poor transcript quality would you consider Rewind.ai to be valuable enough to use day-to-day?

Could you elaborate on what was useless about the screen capture timeline?

evaneykelen · on Feb 22, 2024

I would probably not consider using it, and it's likely due to these factors:

1. I use a limited set of tools (Slack, GitHub, Linear, email), each providing good search capabilities.

2. I can remember things people said, and I said, in a fairly detailed way, and accessing my memory is faster than using a UI.

Other minor factors include: I take screenshots judiciously (around 2500-3000 per year) and bookmark URLs (13K URLs on Pinboard). Rewind did not convince me that it was doing all of this twice as well.

abrichr · on Feb 22, 2024

We are building this at https://openadapt.ai, except the user specifies when to record.

delegate · on Feb 22, 2024

Can also add the photos you take and all the chats you have with people (eg. whatsapp, fb, etc), the sensor information from your phone (eg. location, health data, etc).

This is already possible to implement today, so it's very likely that we'll all have our own personal AIs that know us better than we do.

huytersd · on Feb 22, 2024

If that much processing power is that cheap, this phase you’re describing is going to be fleeting because at that point I feel like it could just come up with ideas and code it itself.

te_chris · on Feb 22, 2024

I could've used this before where I accidentally booked a non-transferrable flight on a day where I'd also booked tickets to a sold out concert I want(ed) to attend.

Buttons840 · on Feb 21, 2024

Perhaps even more valuable is if AI can learn to take raw information and display it nicely. Maybe would could finally move beyond decades of crusty GUI toolkits and browser engines.

psychoslave · on Feb 22, 2024

Perfect, finally I can delegate that lengthy hours spent reading HN fantasies about AI and the laborious art of crafting sarcastic comments.

bagful · on Feb 22, 2024

Amplified Intelligence - I am keenly interested in the future of small-data machine learning as a potential multiplier for the creative mind

busymom0 · on Feb 22, 2024

And then imagine when employers stop asking for resume, cover letters, project portfolios, github etc and instead ask you to upload your entire locally trained LLM.

spaceman_2020 · on Feb 22, 2024

The dystopian angle would be when companies install agents like these on your work computer. The agent learns how you code and work. Soon enough, an agent that imitates you completely can code and work instead of you.

At that point, why pay you at all?

parentheses · on Feb 22, 2024

Aside. Is this your first Sass or Saas?

foolfoolz · on Feb 21, 2024

you could design a similar product to do the opposite and anonymize your work automatically

MetalGuru · on Feb 22, 2024

Isn’t this what rewind does?

bushbaba · on Feb 22, 2024

Basically Google’s current search model, just expanded to ChatGPT style. Great….

EGreg · on Feb 22, 2024

Imagine if it starts suggesting the ideal dating partner as both of you browse profiles. Actually, dating sites can do that now.

dustingetz · on Feb 22, 2024

thoughtcrime