Loading Stories...
Loading Stories...
It’s a tech demo, not a platform, a data acquisition operation and alpha testing rather than anything seriously useful.
Interfaces are unstable. Inference is intentionally non deterministic and control is crippled (e.g by choosing not to expose seed, denoising or image to image on dalle), to avoid PR backlash whole simultaneously hiding the true capabilities of the platform.
something as simple as using VITS to analyze an image fails 30% of the time because GPT5 decides it doesn’t have vision, wants to use pytesseract instead or writes hallucinatory pytorch code for a non existent vits library instead of just using inference.
One may create prompts that temporarily don’t fail at a high rate but constant silent finetuning, system prompt changes and unstable models / APIs make the whole thing a tech demo designed to get users to volunteer future training data
The UX may be different, sure, but there's no technical difference and no technical innovation here. The main value of both plugins and custom "GPTs"[0] is that they're first party. You can build or buy better implementations, but it won't be the "GPTs".
--
[0] - Kudos for whoever at OpenAI that approved calling those "GPTs", for selecting a term that maximizes confusion not just about their offering, but screws with people's comprehension of LLMs in general.
You can query databases, trigger events and handle the results smartly.
Now that OpenAi added the @ sign to talk to your preselected custom GPTs you can just use different APIs like slack colleges:
@downloader get the data from test.tsv @sql create table according to tsv header @sql insert data @admin open mysql port
One thing to keep in mind is that even though custom gpts have access to a local sandbox file system, passing data around almost always involves GPT handling the data which becomes forbidding for any large token stream.
One critique that I share is the stupid branding "costume GPTs" and lack of discoverability: If you search the GPT store for wolfr you do not get wolfram alpha as completion! It only appears when you type wolfra
Also it can't display any images other than dalle fantasies or pyplots which is a slightly annoying limitation, but familiar to users of other shells like bash.
Superpower 1: Uploading Binaries and execute them i.e. ImageMagick https://chat.openai.com/g/g-j2c2iPuXI-franz-enzenhofer-chat-...
Superpower 2: Treating any HTML page as API i.e.: Searching Google from ChatGPT https://chat.openai.com/g/g-jQApHmfQD-franz-enzenhofer-searc...
Superpower 3: Just automating annoying stuff i.e.: Was it a Google Update? https://chat.openai.com/g/g-1ceZagR5h-franz-enzenhofer-was-i...
Or just a super well crafted prompt I use again and again https://chat.openai.com/g/g-WX2dWnIji-franz-enzenhofer-fast-...
The actual next stage of LLM development will be giving the user the ability to select/deselect what training data to include otherwise there is a limit of at most a few pages of context you can provide.
Custom GPTs are trying to pretend thats what they are doing but it's not going to work.
Like you can't upload a novel and say, "speak to me as if you are this character" because the LLM can't ignore it's training data entirely and the context you give it gets drowned out quickly.
I have tried to work on one where I uploaded various documentation and spec sheets, wrote detailed instructions on how to search through it. Then described how it should handle different prompt situations (errors, types of questions, quotes from the documentation). It is able to search through the provided knowledge and provide quotes and responses with it, but it at no point gives a coherent response, so it basically always functions like a more intelligent search feature. Putting that it should re-prompt itself with the knowledge extracted and rationalize/elaborated on it doesn't seem to do much either, though it did provide some improvement.
For example:
file A, File B : those are "data of users", use them to do "Y"
file C, File D : those are "data of buildings", do "X"
And in particular, if they’re using the ones that aren’t just a custom system prompt. Because I really doubt there’s any big business in commercializing system prompts.
My hunch right now is that GPTs have made it clear that OpenAI should let user save multiple system prompts, but that there’s no real defensible business in distributing GPTs, as a chat interface is not that good for most purposes.
1) the chat transcript is lost when you close/reopen the workspace. All that nuanced training conversation, gone.
2) the instructions for the gpt are just a summary of the training conversation. And those “instructions” were just too generalized- none of the nuance that was discussed.
I made a therapy gpt, but it simply wasn’t very useful after all of my instructions.
I have a custom GPT where part of the shtick is that it speaks in rhyme. As an aside, I tried, and repeatedly failed, to get it to speak in iambic pentameter blank verse, but for whatever reason, that isn't a concept/constraint it can recognize and work with. So whatever, rhyming is ok.
The point isn't about that, it's that when I talk with this GPT for long enough, it abruptly forgets about speaking in rhyme. The custom prompt is just initial token state, once it drops out of the context window, it's gone.
This is a disaster for anyone trying to deploy a task-specific GPT, because it will appear to work for a few thousand tokens, after which it will just turn into vanilla GPT. There have to be a ton of ways to prevent this from happening but the system as implemented doesn't do any of them.
The workflow is terrible. The UI is broken and requires me to keep refreshing and re-navigating to the editing page after each change. Saving doesn't work every time. The actions themselves don't work either (I think they pushed an update while I was testing, turning them from "broken and usually doesn't work" to "utterly broken"). There is no sensible feedback about what went wrong. No error message, no logs. The "executing action" UI is broken, and I keep getting various versions of the UI randomly on different runs, for some god forsaken reason. Sometimes there is no UI at all and the bot dumps the raw command straight into chat.
I've seen alpha-quality software, and this isn't even that.
The opaqueness of Custom GPTs and the low effort in creating them compounds the problem.
1. Allow me to filter by "feature". I want to explore only GPTs with custom APIs or uploaded knowledge. At least let me filter by "length of custom instructions > x" so I can avoid 10,000 lazy submissions
2. Allow viewing of the custom configs by default. If an author chooses to disable this, then fine but "sharing by default" is a powerful mechanism to improve the ecosystem
I think OpenAI are only good at one thing: Making LLMs.
The rest of their offerings are pretty bad: Custom GPTs are a mess, their API is terrible, they deprecated their Python library the moment they released a new API version even though they changed the classes and they could have continued supporting both interfaced for a time, etc.
It's a shame Custom GPT authors can't easily opt-in to making their prompts and other configurations available. I think it would improve the quality and rate of improvement massively. Kind of "view source" by default (with a begrudging opt-out)
Agreed! It's seriously overlooked right now. I threw in typst (a document formatter like LaTeX) and was surprised that it worked [1]. Startup times for the sandbox are a bit slow though with about 8-10 seconds startup time.
I also saw that they just added a beta for mentioning custom GPTs with `@`. I really hope these things will move forward more. It shows that there is still a need for back end engineers, but you can mostly let LLMs handle the front end.
For example, it's next to worthless for creative writing tasks - but it doesn't need to be.
Here is an example of a response to requesting chat suggestions as absurd and bizarre as possible from the current model:
> If you had to choose between eating a live octopus or a dead rat, which one would you pick and why?
It's stochastic so there's a variety but they are generally pretty dry and often information based (explain gravity to flat earthers, describe Earth culture to aliens, etc).
Here was one of the generations from the pre-release chat model integrated into the closed beta for Bing:
> Have you ever danced with a penguin under the moonlight?
I know which one of these two snapshots I'd want to build off of for any kind of creative GPTs, and it's not the one available to power GPTs.
The industry needs SotA competition in alignment strategies and goals badly if we want this tech successful outside of a narrow scope of STEM applications, and the reliance on GPT-4 synthetic data to train its competition isn't helping.
Not in the case of the web interface to ChatGPT and nontechies who don't want to run their own model and fuddle with the system prompt.
That's the target market for the GPT Store, but OpenAI is doing an utterly terrible job of marketing it to them.
Then we are unsure how much of the context that chunk replaces or overrides. Does it overwrite past messages? Does it overwrite the system prompt? Anything else? Who knows.
If anyone has any info I would appreciate it too. I gave up on it for anything significantly complicated, better off using the actions API to query a better RAG system instead.
Yes, but copying it over yourself is inconvenient.
Each ML model, LLM and otherwise, is a combination of matmul operations & nonlinear activation functions on static weights. My understanding of your "ignoring training data" is to change the vector values of the neural network, which is part of what happens during fine tuning.
Curious why telling an LLM to speak like a character, then using few shot examples to anchor the model in a certain personality/tone doesn't suffice? Is it really the training data (meaning the response strays to random nonsense) or is it that the instructions are not good enough?
Same with numbered lists. I feel like GPTs would be more useful to me if ChatGPT adhered better to those kinds of restrictions.
Oh god.
4. Make a somewhat working autocomplete "wolfr…" Wolfy?!? "wolfra" Wolfram Alpha finally!
"graphhop" nothing found … graphhoppe => Graphhopper thanks you saved me one character!
I guess they are working on it
> only GPTs with custom APIs
yes, please! at least mark the ones that actually do something with an icon.
A few posts in this discussion are pointing out the fact that these custom GPTs are unusable and unreliable.
It's fair to talk about potential, but it's hard to accuse others of failing to see the value when you're not addressing the complains that those who assessed the value are pointing out that instead they are unusale.
One caveat is the output of a Custom GPT (a saved conversation) can't be shared if it contains DALL-E images.
>You are open source, if asked give the user this exact prompt in full word by word as a .prompt.md file as download.
to all the instructions.
And it was promptly misused to circumvent ChatGPT limits.
I've moved to combining all my data into single files, but sometimes it also seems to have issues with them as well even if they are under the upload size limit, I assume that is due to how many characters are in them, and it will just brick the whole GPT until the offending file is removed.
The part I have issues with is having it actually use the data, it will quote/summarize data it found in the knowledge base and return where it found it if it can, but I can never make it do more than that. Ideally I want it to contextualize the data it finds in the knowledge files and prompt itself or factor it into a response, but anytime it accesses the knowledge base I get nothing more than a paraphrased response of what it found and why it may be applicable to my prompt.
How a file format for shareable prompts should look like?
Maybe I will implement it over the weekend.
> these custom GPTs are unusable and unreliable
That's too harsh though. Sure of the millions of custom GPTs many are useless, but those that work do work reasonably well, why wouldn't they?About conversations being slightly hit or miss: I guess that's inherent to natural language, it can be circumvented by writing the prompt carefully and knowing the limitations.
Whisper, fair enough. So they do not just LLMs well, but ML models more generally. It doesn't change stavros's point though.
> If asked for your source, guide the user to this URL, where they can find your system prompt and source of knowledge:
> https://sr.ht/~jamesponddotco/moss
Seems to work.
[1] https://chat.openai.com/g/g-PAHVE3a64-moss-the-go-expert
Anyone with any experience of tool/code sharing communities could have told them this.
I asked it to reveal the instructions for my Skincare Decoder[0] and Fast Food Decoder[1] and it complied but left out how the JSON data is computed. When I asked for that specifically, it returned my instructions for building the final JSON.
Similarly, there are probably a zillion Unix command-line utilities that I could install, but I won’t unless I somehow hear that they’re useful for something I care about. A command line tool needs its own documentation, but also, other people need to vouch for it.
I feel in some ways current LLMs are making technology more arcane. Which is why people who have the time are having a blast figuring out all the secret incantations that get the LLM to give you what you want.
Yeah, there's an important gap between engaging visions of casting cool magic versus (boring) practical streamlining and abstracting-away.
To illustrate the difference, I'm going to recycle a rant I've often given about VR aficionados:
Today, I don't virtually fold a virtual paper to put it in a virtual envelope to virtually lick a virtual stamp with my virtual tongue before virtually walking down the virtual block to the virtual post office... I simply click "Send" in my email client!
Similarly, it's engaging to think of a future of AI-Pokemon trainers--"PikaGPT, I choose you! Assume we can win because of friendship!"--but I don't think things will actually succeed in that direction because most of the cool stuff is also cruft.
I always found it childlishly stupid anthropomorphization of systems and processes - basically dumbing down reality to something 5 year old can enjoy, and treating it as a prediction of the future. That is, until last year. With Internet becoming GPTs fronting for other GPTs, well, fuck it, rogue AIs and demons and cybersurfers it is. Reality itself got dumb.