Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

Honey, I moved ld.so (or, How to recover your system if you moved /usr or /lib)

2024-05-26 23:35 +0100. Tags: comp, unix, in-english

Suppose, not entirely hypothetically, that you move /usr to /usr_old in a running Linux system. If you are using a modern system in which /bin, /lib, etc. are symlinks to /usr/bin, /usr/lib, etc., you will find out that no commands work anymore, including mv, which means you cannot undo the mess:

root@cursed:/# mv /usr /usr_old
root@cursed:/# ls
bash: ls: command not found
root@cursed:/# mv /usr_old /usr
bash: /usr/bin/mv: No such file or directory

So far, not surprising, because it’s still looking up the commands in the old path. But if you update PATH, it still doesn’t work:

root@cursed:/# PATH=/usr_old/bin
root@cursed:/# mv /usr_old /usr
bash: /usr_old/bin/mv: No such file or directory

bash finds mv in the new path (/usr_old/bin/mv), but says the file doesn’t exist. In fact, even if you call the executable by its full path, you will get the same error, even though the file is there:

root@cursed:/# /usr_old/bin/mv
bash: /usr_old/bin/mv: No such file or directory

What gives?

If you’re using a recent enough version of bash (5.2+), you will get a slightly less mystifying message:

root@cursed:/# /usr_old/bin/mv
bash: /usr_old/bin/mv: cannot execute: required file not found

So the problem is not that /usr_old/bin/mv is missing, but some required file. But what required file?

If you just want the solution, you can just jump to the end of the post. In the rest of this post, we are going to do a deep dive into what is going on here.

Understanding the error

Let’s go back to the older bash error message:

bash: /usr_old/bin/mv: No such file or directory

Why is bash saying that the file does not exist? To understand this one, we need to know a bit about how error handling works in C. C does not have exceptions: when a function fails, it typically signals the error by returning an error value, which depends on the specific function. A lot of standard library / POSIX functions indicate errors by returning -1 and setting a global errno variable with one of various constants indicating which specific error happened. In the errno manpage, we can see a list of possible error constants; one of these is ENOENT, whose meaning is “No such file or directory”. (The name ENOENT comes from “error: no entry”, i.e., no such entry in the directory when looking up a file name.) There is also a function strerror which, given an error constant, returns a string corresponding to the error. So, for example, strerror(ENOENT) returns the string No such file or directory (or possibly a locale-appropriate equivalent). Programs often use this function (or similar ones such as perror) to report errors to the user. So if a program says No such file or directory, it’s quite likely some system call returned ENOENT to it.

There are two things to keep in mind here, though. First, the error constant does not carry any information about which file was not found: if a system call fails with ENOENT, all we know is that it hit a “no such file/directory/entry” error during its execution. Only context can tell us what file it might refer to. Second, even though the error constants have more or less standardized meanings, specific system calls can give more specific meanings to these constants. Each system call’s manpage specifies in what circumstances each error constant is produced. To figure out what ENOENT really means in a given situation, we need to know which system call produced it. So, which system call was bash calling? And why the message changed in bash 5.2?

We can figure that out by hunting down the message in bash’s git history. The added bit is:

+      else if (i == ENOENT)
+       {
+         errno = i;
+         internal_error (_("%s: cannot execute: required file not found"), command);
+       }

If we look further up in this file, we will see that i is set to the value of errno after calling execve, a system call to execute a program. bash is using it to try to execute /usr_old/bin/mv, but it’s getting an ENOENT back, so it prints the required file not found error.

Before bash 5.2, it did not handle ENOENT specially here, and instead the code fell through a more generic error handling path further down in this function, which calls file_error, which uses strerror to generate an error message, which is why we ended up seeing No such file or directory. So at least that part makes sense.

Now we need to figure out why execve is giving an ENOENT. If we look at execve’s manpage, we see that, for this specific system call, an ENOENT means:

The file pathname or a script or ELF interpreter does not exist.

That’s interesting. execve fails with ENOENT not only when the file to be executed does not exist (which is not our case), but also when the ELF interpreter does not exist. So…

What the hell is an ELF interpreter?

ELF is the binary format used for executables in Linux and various other Unix-like systems. Most Linux executables are dynamically linked: they are not standalone executables, but rather they depend on system libraries that need to be loaded at runtime for the program to work. Most programs depend at least on libc (the standard C library), and usually on a bunch of others as well. These libraries have to be loaded and linked to the main program (i.e., references from the main program to library functions and variables have to be adjusted to point to the places in memory where the library was actually loaded at runtime) before it starts to run.

The way this works is by having another program, the dynamic linker, called ld.so or ld-linux.so, do the job of loading the main program and the libraries, linking them together, and then starting the main program. So when you call an executable like mv, what actually gets executed first is ld-linux.so, which loads mv, figures out which libraries it depends on, loads those libraries, links everything together, and then passes the control to mv.

And how does the system know that to run mv it has to call ld-linux.so first? Well, the mv executable (and every other executable that uses the dynamic linker) has ld-linux.so as its ELF interpreter. You know how a shell script specifies an interpreter by having #!/path/to/interpreter in its first line, and when you invoke the script, that program gets called to process the script? Well, an ELF file can also specify an interpreter, by embedding the path to the interpreter as a section of the ELF file. If you run file on mv (in a non-broken system), you will see something like:

$ file /bin/mv
/bin/mv: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=24ef388d6e73508a1be274e260bbe654edb327be, for GNU/Linux 3.2.0, stripped

Here we see that its interpreter is /lib64/ld-linux-x86-64.so.2, which is what will get called when you invoke mv. (The process is more convoluted than a regular shell script interpreter invocation, which means you can’t use any random program as an ELF interpreter, but the idea is similar.)

Now, /lib64/ld-linux-x86-64.so.2 is a symlink to /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2, and /lib is a symlink to /usr/lib, so the system is looking for /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2, which is missing because we moved /usr. That is the missing file!

But just like we can invoke a script either directly by its name or by invoking its interpreter explicitly passing the path to the script as an argument, it turns out we can also call ld-linux.so explicitly passing our executable as an argument:

root@cursed:/# /usr_old/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /usr_old/bin/mv
/usr_old/bin/mv: error while loading shared libraries: libselinux.so.1: cannot open shared object file: No such file or directory

Well, we’re not quite there yet, but that’s progress: it found the linker (because we called it explicitly), but now the linker is failing because it cannot find the libraries mv depends on (since we also moved them). But if you call /usr_old/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 --help, you can see that the linker supports a bunch of options, one of which is --library-path, which we can use to point it to the new path of the dynamic libraries.

The solution

The solution, then, is to invoke the linker manually and provide the modified library path explicitly:

root@cursed:/# /usr_old/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 \
    --library-path /usr_old/lib/x86_64-linux-gnu/ \
    /usr_old/bin/mv /usr_old /usr

And now everything works again:

root@cursed:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

Comentários / Comments (0)

Deixe um comentário / Leave a comment

Main menu

Recent posts

Recent comments


em-portugues (213) comp (152) prog (74) in-english (66) life (49) pldesign (40) unix (39) lang (32) random (28) about (28) mind (26) lisp (25) fenius (22) mundane (22) web (20) ramble (18) img (13) hel (12) rant (12) privacy (10) scheme (10) freedom (8) copyright (7) bash (7) esperanto (7) academia (7) lash (7) music (7) shell (6) mestrado (6) home (6) misc (5) emacs (5) android (5) conlang (5) worldly (4) php (4) book (4) editor (4) latex (4) etymology (4) politics (4) c (3) tour-de-scheme (3) network (3) film (3) kbd (3) ruby (3) wrong (3) security (3) llvm (2) poem (2) wm (2) cook (2) philosophy (2) treta (2) audio (2) comic (2) x11 (2) lows (2) physics (2) german (1) ai (1) perl (1) golang (1) translation (1) wayland (1) en-esperanto (1) old-chinese (1) kindle (1) pointless (1)


Quod vide

Copyright © 2010-2024 Vítor De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Powered by Blognir.