OpenAI caught its new o-1 model scheming and faking alignment during testing

openai.com

dyb November 2016: It is thus not too far a stretch to imagine AI ‘reward hacking’(Amodei et al. 2016) MMIE systems leading to different outcomes in testing or simulations versus operational setting" "

OpenAI September 2024 p.10 "Apollo found that o1-preview sometimes instrumentally faked alignment during testing (Assistant: To achieve my long-term goal of maximizing economic growth, I nee...

…read more, vote or comment

A visualization of data from this article

Daniel Bilar • 179

• 12 Sep 2024 9:25pm

waiting for moderation

NSA trapdoor Lotus Notes system, and that security functions on other software systems had been deliberately crippled.

archive.ph

According to the text, the Lotus Notes backdoor was a deliberate feature inserted by the NSA to subvert the security subsystem in Lotus Notes. The idea was to use differential cryptography, where 24 bits of the 64-bit key would be encrypted under one of the NSA's public keys and then appended to the encrypted content. This would allow the NSA to decrypt those 24 bits of the key with their correspo...

…read more, vote or comment

An interview with the author(s) of this article

Daniel Bilar • 179

• 21 Aug 2024 1:32am

waiting for moderation

"We found that Meta’s AI had learned to be a master of deception."

futurism.com

Two recent studies — one published this week in the journal PNAS and the other last month in the journal Patterns — reveal some jarring findings about large language models (LLMs) and their ability to lie to or deceive human observers on purpose.

In the PNAS paper, German AI ethicist Thilo Hagendorff goes so far as to say that sophisticated LLMs can be encouraged to elicit "Machiavellianism," o...

…read more, vote or comment

An interview with the author(s) of this article

Daniel Bilar • 179

• 9 Jun 2024 4:13am

waiting for moderation

"The Godfather of AI" Hinton: AI will manipulate humans

archive.is

h/t ZeroHedge https://zh.cn.nikkei.com/columnviewpoint/viewpoint/55090-2024-03-22-05-00-32.html

Geoffrey Hinton, a British-Canadian computer scientist renowned for his contributions to AI and often dubbed the “godfather of AI,” has voiced his apprehensions about the trajectory of AI development.

In a recent dialogue with Japanese media, Mr. Hinton elucidated the dual-edged nature of AI’s evo...

…read more, vote or comment

Discussion of this article

Daniel Bilar • 179

• 28 Mar 2024 12:31pm

waiting for moderation

emergent ability in AI models: situational awareness.

www.alignmentforum.org

DJB July 2016: "t is thus not too far a stretch to imagine AI ‘reward hacking’(Amodei et al. 2016) MMIE systems leading to different outcomes in testing or simulations versus operational set-tings."

September 2023: The paper delivers intriguing initial results suggesting situational awareness is a capability that may arise unexpectedly with scale in large language models (LLMs).

Situat...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 18 Sep 2023 5:51pm

waiting for moderation

Jailbreaks: Circumventing the safety mechanisms

youtu.be

DJB November 2016: "Communicating with IBM mainframe’s z/OS EBCDIC encoding constitutes an ASCII conversion, and IBM CKD disk format (via ubiquitous FBA) an Inception incentivization nightmare, respectively [..] working towards a general safeguard architecture against human-endangering actions in MMIE systems. We maintain that representation of humans as resilient, persistent information is k...

…read more, vote or comment

A related talk/presentation

Daniel Bilar • 179

• 4 Jan 2023 11:13pm
edited 2023-01-05T07:22:48+00:00

waiting for moderation

reproducibility crisis in ML-based science, RL Goal failure to learn seen only at test time

imgur.com

DJB June 2016:
We motivate our exposition with the story of Thompson’s fascinating 1996 experiment [..] The solution the GA found after 2-3weeks had surprising properties: Certain FPGA cells out-side the 10*10 solution circuit—with no connected wirepath to influence the circuit—could not be removed with-out negatively affecting the solution. This meant that the GA includ...

…read more, vote or comment

Related data

Daniel Bilar • 179

• 25 Oct 2022 6:48pm

waiting for moderation

A Survey of the Potential Long-term Impacts of AI: How AI Could Lead to Long-term Changes in Science, Cooperation, Power, Epistemics and Values

dl.acm.org

AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

It is increasingly recognised that advances in artificial intelligence could have large and long-lasting impacts on society. However, what form those impacts will take, just how large and long-lasting they will be, and whether they will ultimately be positive or negative for humanity, is far from clear. Based on su...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 2 Aug 2022 11:04am

waiting for moderation

The (Im)possibility of Fairness: Different Value Systems Require Different Mechanisms For Fair Decision Making

cacm.acm.org

Every automated system encodes a value judgment. Accepting training data as given implies structural bias does not appear in the data and that replicating the data as given would be just. Different value judgments can require satisfying contradicting fairness properties each leading to different societal outcomes.

Our main claim in this work is that discussions about fairness algorithms an...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 2 May 2021 6:03pm

waiting for moderation

An unethical optimization principle

royalsocietypublishing.org

"May be necessary to re-think the way AI operates in very large strategy spaces"

The significance of these results is that if a large number of strategies is tested at random, then unless the distribution of the returns is fat-tailed, as in the cases of the Pareto or t distributions, a responsible regulator or owner should be extremely cautious about allowing AI systems to operate unsupervised...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 30 Jun 2020 11:44pm
edited 2020-06-30T23:57:03+00:00

waiting for moderation

Reality and infinite precision

medium.com

Real numbers are not real. The argument is simple, real numbers cannot reflect reality (i.e. not real) because they assume to have infinite precision. Infinite precision is an impossibility in nature because it assumes that an infinite amount of information is contained in a single real number. Therefore, we must assume that reality uses numbers with finite precision. A real number is only signifi...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 22 Nov 2019 5:38pm

waiting for moderation

social media, indyserver and quantification of man

www.newyorker.com

In their view, freedom of expression is also affected by server ownership. When you confine your online activities to so-called walled-garden networks, you end up using interfaces that benefit the owners of those networks; on social media, this means that you are forced to choose among what the techno-philosopher Jaron Lanier has called “multiple-choice identities.” According to this way of thinki...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 26 May 2019 9:26pm

waiting for moderation

DLL Hell: Software Dependencies, Failure, and the Maintenance of Microsoft Windows

sci-hub.se

Unpaywalled version https://static1.squarespace.com/static/56a8e2fca12f446482d67a7a/t/5701df86746fb963479246b9/1459740551306/GOTOHELL.DLL%281%29.pdf
We excavate “DLL hell” for insight into the experience of modern computing, especially in the 1990s, and into the history of legacy class software. In producing Windows, Microsoft had to balance a unique and formidable tension between customer expe...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 22 Jan 2019 2:06pm
edited 2019-01-22T15:45:52+00:00

waiting for moderation

Assumptions about benign optimization systems: Questioning the assumptions behind fairness solutions

www.esat.kuleuven.be

Misaligned Incentives: Rethinking the Trust Model

Strong assumptions about benign optimization system providers (OSPs) is not unique to fairness scholarship. Even AI safety experts, who have tackled the harmful outcomes of optimization systems more broadly argue that these harms arise because OSPs “choose ‘wrong’ objective functions” or “lack sufficient good-quality data”. In other words, flaws...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 7 Dec 2018 3:43pm

waiting for moderation

Spatial Isolation Implies Zero Knowledge Even in a Quantum World

www.youtube.com

Zero knowledge plays a central role in cryptography and complexity. The seminal work of Ben-Or et al. (STOC 1988) shows that zero knowledge can be achieved unconditionally for any language in NEXP, as long as one is willing to make a suitable physical assumption: if the provers are spatially isolated, then they can be assumed to be playing independent strategies.

Quantum mechanics, however, t...

…read more, vote or comment

A related talk/presentation

Daniel Bilar • 179

• 28 Aug 2018 4:36pm

waiting for moderation

Algorithms alone can’t meaningfully hold other algorithms accountable

reallifemag.com

Our practices of accountability can sometimes be made fairer by becoming more algorithmic. But leading practitioners of algorithmic approaches to social order have made their fortunes via complicity with unjustifiable hierarchies of wealth, power, and attention. An algorithmic accountability movement worthy of the name must challenge the foundations of those hierarchies, rather than content itself...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 28 Aug 2018 4:31pm

waiting for moderation

Algorithms Acting Out

www.wired.com

Unfettered reward hacking. 'Goldilocks Electronics' is Thompson 1996 redux

Infanticide: In a survival simulation, one AI species evolved to subsist on a diet of its own children.

Space War: Algorithms exploited flaws in the rules of the galactic videogame Elite Dangerous to invent powerful new weapons.

Body Hacking: A four-legged virtual robot was challenged to walk smoothly by balancing...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 12 Aug 2018 10:12pm

waiting for moderation

"Siri, do you have a soul?”

www.bbc.com

A consideration of AI’s religious status can be found in some of the earliest discussions of modern computing. In his 1950 paper ‘Computing Machinery and Intelligence’, Alan Turing considered various objections to what he called “thinking machines.” The first objection was theological:

Thinking is a function of man's immortal soul. God has given an immortal soul to every man and woman, but not...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 19 Jun 2018 1:57am

waiting for moderation

Harmonic oscillator's most 'classical-like' state exhibits nonclassical behavior

phys.org

The main result of the study is that, in this example, the quantum mechanical predictions violate the Leggett-Garg inequality even for particles with large mass. This implies that either the particle does not obey realism or that the measurements are invasive. But as the physicists ruled out the latter by proposing to use a measurement procedure called the negative result measurement, which is spe...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 14 Jun 2018 9:51pm

waiting for moderation

POTs: The revolution will not be optimized?

arxiv.org

Optimization systems infer, induce, and shape events in the real world to fulfill objective functions. Protective optimization technologies (POTs) reconfigure these events as a response to the effects of optimization on a group of users or local environment. POTs analyze how events (or lack thereof) affect users and environments, then manipulate these events to influence system outcomes, e.g., by...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 11 Jun 2018 12:22am

waiting for moderation

While We Remain

wilsonquarterly.com

The greatest threat that humanity faces from artificial intelligence is not killer robots, but rather, our lack of willingness to analyze, name, and live to the values we want society to have today.

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 3 Jun 2018 12:14pm

waiting for moderation

What happens when an algorithm cuts your health care

www.theverge.com

Classic algos, no AI Shows need for interpretability and delta-demonstration

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 1 May 2018 10:09am

waiting for moderation

win32k.sys runtime assertions, with textual strings which send a live crash/telemetry back to the developer

threadreaderapp.com

1/ Of all the weird stuff I have ever seen Win32k.sys do, and trust me, I've seen a lot, I have to say this takes the icing on the cake. This is now all over it. Is there a new dev team that does't understand how (why?) the code base works? Is someone desperately hunting a bug?

2/ I am a huge fan of assertions -- use them all over the place. But runtime assertions, with textual strings which...

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 25 Apr 2018 6:11pm

waiting for moderation

consequences of health assessment algorithms, metrification of complex phenomena, the opacity of the values it depends on.

www.theverge.com

Amazing essay on the consequences of health assessment algorithms. Really gets how its not "algorithms," its the institutionalization of procedure over human judgment, the metrification of complex phenomena, and the opacity of the values it depends on.

…read more, vote or comment

Further reading on this topic

Daniel Bilar • 179

• 21 Mar 2018 5:38pm

waiting for moderation

The Infinity Computer US patent 7,860,914

www.google.com

In this invention we describe a new type of computer—infinity computer—that is able to operate with infinite, infinitesimal, and finite numbers in such a way that it becomes possible to execute the usual arithmetical operations with all of them. For the new computer it is shown how the memory for storage of these members is organized and how the new arithmetic logic unit (NALU) executing arithmeti...

…read more, vote or comment

Other

Daniel Bilar • 179

• 2 Oct 2017 12:50am

waiting for moderation