Like most people, I’ve been futzing around with ChatGPT and Midjourney now and then. While rationally, what ChatGPT can do is simply mindblowing, Midjourney is breathtaking on a different level. It could be because Midjourney is a visual medium. I don’t think I could ever imagine coming up with what Midjourney can. Maybe it’s that Midjourney can generate images mimicking so many different painters, illustrators, and movements.
My initial prompt involved Darth Vader, Praetorian Guards, and cyberpunk. Honestly, I can’t even completely retrace my steps. After many variations, I found the character at the bottom left interesting.
And then stormtroopers got thrown into the mix. I liked the bottom left.
More and more variations, and I asked Midjourney to keep the details. Honestly, all of these are cool. I liked the top row.
When this appeared as the output, I was mesmerized.
Eventually, I ended up with this:
I could wax poetic, but what’s the point?
From a productivity angle, there are two related issues. First, no matter how good a person is at prompts, it takes skill (and luck) to communicate what you want and get the output you expect. That process would be more straightforward talking face-to-face with a designer. Second, tracking how you got to a certain point and retracing your steps is almost impossible. When we say generative AI, every instance is unique, and there is no way to go back. Implementing for productivity involves some ability to execute a repeatable process to get repeatable output. Midjourney and other generative AIs can’t do this in their current form. Of course, they weren’t built that way. Still, it could be possible if the parameters and models are defined with boundaries. But then the whole generative part loses much of its value.
What is the path of evolution here? Say you have generative AIs that model the existing content and continuously learn from what we create. Does it all end up looking the same? If you take websites as an example, everyone has a blank canvas in terms of visual design. Still, they all end up being grids, with big image(s) at the top, two or 3 columns, a wide section, followed by two or 3 columns, and repeat.
Will we all start out experimenting but eventually converge on some level of blandness?