Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Until We started to see LLMs, and the tools that can be created with them, I doubted the possibility of Star Trek's Voice command system. Asking for the computer to clarify some concept, or filter and reduce data sets based on arbitrary data was pure science fiction.

Seeing something like this makes me think that the arbitrary holodeck commands "Paris, 1950's, rainy afternoon" is suddenly not a challenging part of the equation. It's really exciting.



Here's an image result from MidJourney for "Paris, 1950s, rainy afternoon". No additional editing, and I intentionally avoided adding any text to the prompt beyond your own.

https://i.imgur.com/IYuh29H.png

Not perfect but man, we're getting pretty close.


interesting. i apparently opened that image, then forgot about it, and without context saw it later. i looked at it, thought wow that's a cool picture from back in the day, looked at the people in it, and left.

i now ran into your comment (with a purple link) and did some reflection. upon reexamination, its clear that the picture is fake (because im looking for it) but when i wasn't looking for it, its interesting how all the "hot spots" or interesting pieces of the picture are pretty good and the (imo) lackluster parts are the "less interesting" pieces like the end of the roads where it it blurs out. i wonder if that bias is inherently ingrained in the system.


The focus blur is repulsive. I think convincing focus blur will be the milestone that replaces stock photography.


It can get a bit better if the prompt is made more detailed. For instance here are the four results I got for "Professional black and white photo of Paris in the 1950s, on a rainy afternoon. Leica 35mm lens. --s 1000" (--s 1000 lets it 'stylize' a bit more).

https://i.imgur.com/pPU7K0c.png

Things still get a little weird in the distance (particularly in photo 3), but I think overall it's a bit better. People who are really good at writing prompts could probably do even better, although one of the strengths of MidJourney V4 and V5 is that it can give good results without the traditional paragraph of "incredible, award winning, photo of the year" etc.


Very interesting. Photo 4 is a significant step in the right direction. It's refreshing that it doesn't veer towards a Gaussian look either. Thanks for sharing.


It's a nice image, but I find it's easy to spot IA generated images when trying to make images of very specific existing hardware. Here you can see all the cars are generic with 50's design look, none are models that existed. Try to ask an IA to draw you a Boeing 747-400 for example, see what I mean. Btw, Have you noticed all the Youtube vids thumbnails made by IA now ? Easy to spot.


These systems sure hallucinate text.


Yes, because models like midjourny tiny compared to LLMs like GPT. I'm pretty sure there's a good hackernews discussion on this that occurred recently, but with all the AI talk I can't find it. But really we need a lot less information to make a reasonable city, then the amount of information we need to make billboards and signs make sense. I don't think Midjourny wants to pay 10+ million dollars to have their model trained.


> then the amount of information we need to make billboards and signs make sense.

Subsequently, this applies to posters, letters, newspapers, and other types of text-heavy images, ultimately reducing the language modeling problem to an image generation problem.


if you ever find it i’d love to read!


I wonder if how feasible it would be to have mid journey mark the points where text should be, then pass it off to GPT to propose the text to write.


midjourney-- a window in the past and future


A prompt like that already can generate something great using stable diffusion/mid journey. Very exciting indeed that LLMs are now similarly so capable.


Using stable diffusion suddenly makes Picard telling the computer his tea has to be hot every time seem reasonable.


When Siri, Alexa, and Google home came out I was convinced voice would be the next paradigm shift in human computer interactions, comparable to mobile, but the voice assistants fell short and I was disappointed.

Now it's clear that the shift is coming and it will revolutionize the way we interface with machines.


The image AIs are much more capable too, and I find it interesting that every technically inclined person used to always make fun of those “enhance” moments in TV shows and movies where they would zoom into some area of a photograph or security footage. The fact that now this is actually possible (to some extent at least) is pretty wild.


On a serious note, this is where these models get dangerous - when someone "zooms in" on the technology, finds something the computer created from nothing, and then takes that as irrefutable fact.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: