You've probably thought of this, but as a method of motivational control, how about giving it goals that terminate rather quickly. You make a chess-bot's goal to win 100 games, not as many games as possible. You give an oracle a goal of answering 100 questions sufficiently well, not as many questions as it can, or as best as it can, or any other hidden infinite-runaway-enabling goals. Also dangerous, but it sounds like another nice precaution to me.
@gooseofpower Terminating quickly, in terms of time, is one of the methods we considered (high discount rates). Trying to satisfice, as tmyler said, is tricky; the AI can still spend energy to to ensure that they have won 100 games, to be absolutely sure they were not tricked, etc... AIs tend to evolve towards utility maximisers, not satisficers, see Omohundro's paper
@MeltedCheesefondue Thanks. I agree with you. Discounting sounds like a good technique too.
I've read Omohundro's paper on Basic AI Drives. I think you're probably right that AIs tend towards maximizers even when their goals are "satisficing"-style goals, but do you know of a more rigorous explanation of this? Do you think it is impossible to identify all runaway-enabling side effects in advance, given a particular goal?
reminds of a Eclipse Phase adventure, that one
DamielBE 4 months ago
@DamielBE If that's Anders Sandberg's adventure, then its based on our paper :-)
MeltedCheesefondue 4 months ago
You've probably thought of this, but as a method of motivational control, how about giving it goals that terminate rather quickly. You make a chess-bot's goal to win 100 games, not as many games as possible. You give an oracle a goal of answering 100 questions sufficiently well, not as many questions as it can, or as best as it can, or any other hidden infinite-runaway-enabling goals. Also dangerous, but it sounds like another nice precaution to me.
gooseofpower 4 months ago
@gooseofpower That is sometimes known as satisficing (rather than maximising). Perhaps try a search on those two terms.
tmtyler 4 months ago
@gooseofpower Terminating quickly, in terms of time, is one of the methods we considered (high discount rates). Trying to satisfice, as tmyler said, is tricky; the AI can still spend energy to to ensure that they have won 100 games, to be absolutely sure they were not tricked, etc... AIs tend to evolve towards utility maximisers, not satisficers, see Omohundro's paper
MeltedCheesefondue 4 months ago
@MeltedCheesefondue Thanks. I agree with you. Discounting sounds like a good technique too.
I've read Omohundro's paper on Basic AI Drives. I think you're probably right that AIs tend towards maximizers even when their goals are "satisficing"-style goals, but do you know of a more rigorous explanation of this? Do you think it is impossible to identify all runaway-enabling side effects in advance, given a particular goal?
gooseofpower 4 months ago