My whirlwind journey through another Generative AI hype cycle

Jason Lutterloh

--

At the request of one of my leaders to help “champion the use of Generative AI (GenAI)”, I dove back into a much-hyped technology that I’d recently ridiculed and dismissed. As an engineer who spent over four years in an innovation department, I’ve seen the hype cycle more times than I can count. It usually led to some well-meaning but overeager business person committing to a technology that they didn’t fully understand and then forcing IT to build something with it. We’d often realize the “smoke and mirrors” or shortcomings during that build out and release a product we’d become totally disillusioned by.

This hasn’t been any different for me with GenAI. I’ve been around through the hype cycles of a bunch of the precursor terms to “GenAI” — NLG, NLU, NLP, ML, etc. Along the way, I’ve played around with quite a few of the different products released. I even went to a chatbot conference back in 2017. It’s never quite measured up to the expectations I had. I’ve been through a few cycles of “this will be life changing” to “I want to get rid of all technology in my home”. Admittedly, I want a near-perfect solution that actually anticipates my needs and consistently and confidently solves real problems. I don’t need a solution looking for a problem. I’m sure some summarizations are useful in some context, but how many people really need the ability to summarize a Google doc on an everyday basis or to summarize an email which is already probably shorter than a screen (or is an ad if it isn’t)?

Either way, championing generative AI is my new task.

So, I jumped right in. I plunged into a fairly simplistic example of what I thought might be a value-add time saver at work and then came home and went wild on a few ideas. First, I detailed my financial assets in a markdown document, gave it to ChatGPT and Gemini, and told them to be financial advisors. I asked them various finance questions. They both agreed I was probably too heavily allocated to the S&P500 but gave wildly different answers on how much money I’d save longterm by making an extra mortgage payment. Still, I thought it was pretty neat and it was able to do some calculations that I would’ve spent far too long in Excel to do myself.

Next, I planned to make one of them my wine sommelier. I own just enough bottles of wine that I use an app called “CellarTracker” to keep track of what I own. I printed out a list, gave it to the system, and then asked which bottle would be a good choice to pair my with my BBQ for dinner that night? It spit out a very well-reasoned response with a recommendation. Pretty cool!

At this point, I was probably at the top of my own personal hype cycle. Then, the next day came.

I bought two more bottles of wine so I updated my CellarTracker and then went to update my new personal sommelier. It took the information and confidently assessed that it understood and would add it to the list. Good. The next day, I decided to verify. I asked it to print my new wine list. It printed the original document — without the new wines. It was already kind of annoying that I had to add a step in my process — updating GenAI after I’d already had to update CellarTracker for this to work, but for it to fail my expectation was really annoying. I reasoned that maybe it thought I was referring to the document and it hadn’t understood the original intent to maintain a running wine list. So, I asked, “What two wines did I add yesterday?” No joke, it came up with two wines that I have never owned let alone drank in my entire life. I informed it that that information was wrong. It acknowledged, apologized, and then came up with two more I’d never had! That’s apparently what they mean by “hallucination” — where AI confidently returns something as fact that is actually wrong. When that happened, my enthusiasm crashed back down to reality. Yet again, it failed to match my expectations and this time, had pretty much destroyed any trust I could have in the thing. If anyone in a business setting boldly declared false information and then doubled down on them with more false information, they’d be canned or disciplined. And yet, pretty much everyone is putting GenAI on their strategic roadmaps. Now, I recognize that the prompt box literally says “Don’t trust me” and I can rationalize that I had unfair expectations of it. It isn’t meant to be a list maintainer. Maybe the proper way to do this would be to have a wine list saved to the cloud as often as updated and then have Gemini or ChatGPT reference it whenever I asked a question. (I’ll figure out how to automate that later…)

Still, the fact that it hallucinated two bottles of wine and gave them back to me confidently, broke trust. To double down after being called out was even worse. All that to say, people need to realize that while this technology is cool — it is worthless if we can’t trust it.

From what I’ve experienced so far, the way to at least increase our ability to trust it is to give it ample amounts of context and very explicit instructions on what you want from the output. This is why everyone keeps saying that “prompt engineering” (crafting prompts designed to get better output) is so important. It really is — with what we have today. There’s still clearly a host of limitations and we need to find specific use cases that actually solve problems or provide efficiencies as we move forward. If not, we’re going to have a bunch of people think it can do anything and fall into disillusionment.

With a clearer perspective, I’m focused on moving beyond the initial hype and into the productive application of GenAI. I’m committed to finding impactful use cases that solve problems efficiently and foster trust in the technology. This is where the real work begins, and I’m excited about the challenge.

(Disclosure: GenAI helped me write this last paragraph — and offered editing suggestions for all the others.)

AI-Generated image for this post using an oversimplistic prompt

--

--

Jason Lutterloh
Jason Lutterloh

Written by Jason Lutterloh

0 Followers

Jason Lutterloh is a full-stack software engineer.

No responses yet