A festival of truth-seeking, optimization, and blogging. We'll have writing workshops, rationality classes, puzzle hunts, and thoughtful conversations across a sprawling fractal campus of nooks and whiteboards.
Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.
Reasons are unclear (as usual when safety people leave OpenAI).
The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway.
OpenAI announced Sutskever's departure in a blogpost.
In my opinion, a class action filed by all employees prejudiced (I say allegedly here, reserving the right to change 'prejudiced' in the event that new information arises) by the NDAs and gag orders, to terminate these agreements, would be extremely effective.
An arbitral tribunal as the format, rather than a court or internal bargaining, is far more likely to grant compensation to ex-employees.
See Trump's NDA termination.
Crosspost from my blog.
If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.
For...
Do you happen to have a copy of it that you can share?
I have liked music very much since I was a teenager. I spent many hours late at night in Soulseek chat rooms talking about and sharing music with my online friends. So, I tend to just have some music floating around in my head on any given day. But, I never learned to play any instrument, or use any digital audio software. It just didn't catch my interest.
My wife learned to play piano as a kid, so we happen to have a keyboard sitting around in our apartment. One day I was bored so I decided to just see whether I could figure out how to play some random song that I was thinking about right then. I found I was easily able to reconstitute a piano...
The first two reasons that come to my mind are (1) other instruments have much more career incentive to do so (in that there are many more jobs for classical violinists or violin ensembles than for classical guitarists), and (2) it’s possible to have a much more successful career as a guitarist knowing only chord positions and not having a more detailed understanding of the fretboard, than it is with other instruments where a knowledge of how to play complicated melodies is required.
This is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Duke Arado’s obsession with physics-defying architecture has caused him to run into a small problem. His problem is not – he affirms – that his interest has in any way waned: the menagerie of fantastical buildings which dot his territories attest to this, and he treasures each new time-bending tower or non-Euclidean mansion as much as the first. Nor – he assuages – is it that he’s having trouble finding talent: while it’s true that no individual has ever managed to design more than one impossible structure, it’s also true that he scarcely goes a week without some architect arriving at his door, haunted...
Looks like architects apprenticed under B. Johnson or P. Stamatin always make impossible structures.
Architects apprenticed under M. Escher, R. Penrose or T. Geisel never do.
Self-taught architects sometimes do and sometimes don't. It doesn't initially look promising to figure out who will or won't in this group - many cases of similar proposals sometimes succeeding and sometimes failing.
Fortunately, we do have 5 architects (D,E,G,H,K) apprenticed under B. Johnson or P. Stamatin, so we can pick the 4 of them likely to have the lowest cost proposals.
Cos
Here’s a conception that I have about sacredness, divinity, and religion.
There’s a sense in which love and friendship didn’t have to exist.
If you look at the animal kingdom, you see all kinds of solitary species, animals that only come together for mating. Members of social species – such as humans – have companionship and cooperation, but many species do quite well without being social.
In theory, you could imagine a world with no social species at all.
In theory, you could imagine a species of intelligent humanoids akin to Tolkien’s orcs. Looking out purely for themselves, willing to kill anyone else if they got away with it and it benefited them.
And then in another sense, some versions of love and friendship do have to exist.
Social species evolved for a...
Thank you for your thoughts.
I often reflect that, in my attempts to model life on this planet from all that I have observed, experienced, read, and reflected on, it seems like there is a persistent "force" that is supporting life at ever greater levels of organization and complexity. The fields, circumstances, and conditions of this planet seem to give chances to any strategy for organizing on top of what has already been organized. Trillions of chances over billions of years, with almost as many failures. Almost.
I'm not the most science-y, but it seems t...
I expect it would be useful when developing an understanding of the language used on LW.
We don't have a live count, but we have a one-time analysis from late 2023: https://www.lesswrong.com/posts/WYqixmisE6dQjHPT8/2022-and-all-time-posts-by-pingback-count
My guess is not much has changed since then, so I think that's basically the answer.
Epistemic status: I wrote this in August 2023, got some feedback I didn't manage to incorporate very well, and then never published it. There's been less discussion of overhang risk recently but I don't see any reason to keep sitting on it. Still broadly endorsed, though there's a mention of a "recent" hardware shortage which might be a bit dated.
I think arguments about the risks of overhangs are often unclear about what type of argument is being made. Various types arguments that I've seen include:
This seems to be arguing that the big labs are doing some obviously-inefficient R&D in terms of advancing capabilities, and that government intervention risks accidentally redirecting them towards much more effective R&D directions. I am skeptical.
...
- If such training runs are not dangerous then the AI safety group loses credibility.
- It could give a false sense of security when a different arch requiring much less training appears and is much more dangerous than the largest LLM.
- It removes the chance to learn alignment and safety detail
This is the fourth in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan?
This post has more of my personal opinions than previous posts or the report itself.
Other movements should try to avoid becoming as partisan as the environmental movement. Partisanship did not make environmentalism more popular, it made legislation more difficult to pass, and it resulted in fluctuating executive action. Looking at the history of environmentalism can give insight into what to avoid in order to stay bipartisan.
Partisanship was not inevitable. It occurred as the result of choices and alliances made by individual decision makers. If they had made different choices, environmentalism could have ended up being a bipartisan issue, like it was in the 1980s and is in some countries...
Thank you !
The links to the report are now fixed.
The 4 blog posts cover most of the same ground as the report. The report goes into more detail, especially in sections 5 & 6.
Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum
Abstract:
...Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components:
I wrote up some of my thoughts on Bengio's agenda here.
TLDR: I'm excited about work on trying to find any interpretable hypothesis which can be highly predictive on hard prediction tasks (e.g. next token prediction).[1] From my understanding, the bayesian aspect of this agenda doesn't add much value.
I might collaborate with someone to write up a more detailed version of this view which engages in detail and is more clearly explained. (To make it easier to argue against and to exist as a more canonical reference.)
As far as Davidad, I think the "manually bui...