Close this window

Email Announcement Archive

[Users] Perlmutter Scratch File System Degraded: A Small Number of Files Inaccessible

Author: Kevin Gott <kngott_at_lbl.gov>
Date: 2023-09-20 11:09:21

Hello NERSC Users, Last night at about 19:00 Pacific Time, one of the OSTs in Perlmutter’s scratch file system went offline. Until the OST can be recovered, data on this OST will be inaccessible to users. The data on this OST is safe, but for now attempts to list, read, or update these files will fail with errors like “Input/output error“ or “cannot access <file>: Cannot send after transport endpoint shutdown”. This is one of 274 OSTs on the system, so most files on Perlmutter scratch will be unaffected, but specific files will be inaccessible until the OST is repaired. To avoid unexpected failures, we have held jobs that request “scratch” licenses so they won’t run. Jobs that were submitted from the scratch file system will need to be re-submitted from another directory (i.e. from your home directory using the full path of the job script in scratch: “sbatch $SCRATCH/my_job.sh”). For jobs using scratch data but launched elsewhere, you can unblock your job by removing the “scratch” license from existing jobs with ’scontrol update job=<jobid> Licenses=""’. NERSC engineers have opened a critical case with the vendor and are actively working with them to address the issue. We will update users as the situation progresses. For the most up-to-date information, please refer to the NERSC MOTD <https://www.nersc.gov/live-status/motd/>. Best Regards, Kevin Gott & Lisa Gerhardt NERSC _______________________________________________ Users mailing list Users@nersc.gov

Close this window