Author: Rebecca Hartman-Baker <>
Date: 2020-11-17 07:51:10

# NERSC Weekly Email, Week of November 16, 2020 <a name="top"></a> # ## Contents ## - [Summary of Upcoming Events and Key Dates](#dates) ## [NERSC Status](#section1) ## - [NERSC Operations Continue, with Minimal Shelter-in-Place Impacts](#curtailment) ## [This Week's Events and Deadlines](#section2) ## - [(NEW) Join Us for the NUG Meeting, this Thursday 19 November, 11am PT](#webinar) - [(NEW) Expect Invitation to 2020 NERSC User Survey this Week](#usersurvey) ## [Updates at NERSC ](#section3) ## - [Final Full-Facility Power Outage December 15-20](#powerupgrade) - [Update on cscratch1 Issues in September/October](#cscratchupdate) - [New PE Installations for this Week's Maintenance](#novpe) - [Try Out the New NERSC Help Portal!](#helpportal) - [Test out NERSC's New, Filesystem-Like HPSS Interface!](#hpss) ## [Calls for Participation](#section4) ## - [Call for Participation: First International Symposium on Checkpointing for Supercomputing (SuperCheck21)](#ckpt) ## [Upcoming Training Events ](#section5) ## - [(NEW) Join Us for a Training on NVIDIA HPC SDK, December 8 & 10](#nvidiatrain) - [Mark Your Calendar for TotalView Training, December 9!](#tvtutorial) ## [NERSC News ](#section6) ## - [No New "NERSC User News" Podcast this Week](#nopodcast) - [Come Work for NERSC!](#careers) - [Upcoming Outages](#outages) - [About this Email](#about) ## Summary of Upcoming Events and Key Dates <a name="dates"/></a> ## November 2020 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 *18**19* 20 21 18 Nov Cori Monthly Maint [1] 19 Nov NUG Monthly Webinar [2] 22 23 24 25 *26--27* 28 26-27 Nov Thanksgiving Holiday [3] 29 30 December 2020 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 *7* *8* *9**10* 11 12 7 Dec SuperCheck21 Submissions Due [4] 8,10 Dec NVIDIA HPC SDK Training [5] 9 Dec TotalView Debugger Training [6] 13 14 *15--16--17--18--19- 15-20 Dec NERSC Building Power Upgrade [7] -20* 21 22 23 *24--25--26- 24 Dec- Christmas/New Year Holiday [8] -27--28--29--30--31-- 1 Jan 2021 January 2021 Su Mo Tu We Th Fr Sa --1* 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 *18* 19 *20* 21 22 23 18 Jan MLK Holiday [9] 20 Jan Cori Monthly Maint Window [1] 24 25 26 27 28 29 30 31 1. **November 18, 2020 & January 20, 2021**: Cori Monthly Maintenance Window 2. **November 19, 2020**: [NERSC Users Group Monthly Webinar](#webinar) 3. **November 26-27, 2020**: Thanksgiving Holiday (No Consulting or Account Support) 4. **December 7, 2020**: [SuperCheck21 submissions due](#ckpt) 5. **December 8 & 10, 2020**: [NVIDIA HPC SDK OpenMP Offload Training](#nvidiatrain) 6. **December 9, 2020**: [TotalView Debugger Training](#tvtutorial) 7. **December 15-20, 2020**: [NERSC Building Power Upgrade](#powerupgrade) 8. **December 24, 2020-January 1, 2021**: Christmas/New Year Holiday (Limited Consulting & Account Support) 9. **January 18, 2021**: Martin Luther King Jr. Holiday (No Consulting or Account Support) 10. All times are **Pacific Time zone** - **Upcoming Planned Outage Dates** (see [Outages section](#outages) for more details) - **November 18, 2020**: Cori Monthly Maintenance - **November 18, 2020**: Data Transfer Nodes (DTNs) Maintenance - **November 18, 2020**: JGI db, int, & web servers (short maintenance) - **December 15-20, 2020**: Full, sitewide outage (everything unavailable) - **Other Significant Dates** - **January 19, 2020**: Allocation Year 2021 Begins - **February 4-5, 2021**: [First International Symposium on Checkpointing for Supercomputing (SuperCheck21)](#ckpt) - **February 15, 2020**: Presidents Day Holiday (No Consulting or Account Support) - **February 17, 2021**: Monthly Cori Maintenance Window ([back to top](#top)) --- ## NERSC Status <a name="section1"/></a> ## ### NERSC Operations Continue, with Minimal Shelter-in-Place Impacts <a name="curtailment"/></a> Alameda County, California, where NERSC is located, remains under a shelter-in-place order. NERSC continues to remain open while following site-specific protection plans. We remain in operation as before, with the majority of NERSC staff working remotely for the foreseeable future, and only staff essential to operations onsite. You can continue to expect regular online consulting and account support but no telephone support. Trainings will continue to be held online, or postponed if online is infeasible. Regular maintenances on the systems will continue to be performed while minimizing onsite staff presence, which could result in longer downtimes than would occur under normal circumstances. Because onsite staffing is so minimal, we request that you continue to refrain from calling NERSC Operations except to report urgent system issues. For **current NERSC systems status**, please see the online [MOTD]( and [current known issues]( webpages. ([back to top](#top)) --- ## This Week's Events and Deadlines <a name="section2"/></a> ## ### (NEW) Join Us for the NUG Meeting, this Thursday 19 November, 11am PT <a name="webinar"/></a> The NUG Monthly Webinar is now the NUG Monthly Meeting with a more interactive format, on the third Thursday of every month. Our October meeting is **this Thursday, 19 November, at 11am** (Pacific time), at <>. Our aim for these meetings is for a forum where NERSC and its users can celebrate successes, discuss difficulties and learn from each other. We'll follow the structure described below; please come along and join the discussion! - **Win-of-the-month:** open discussion for attendees to tell of some success you've had - e.g., getting a paper accepted, solving a problem, or acheiving something innovative or high impact using NERSC. - **Today-I-learned:** open discussion for attendees to point out something that surprised them, or that might be valuable to other users to know. - **Announcements and CFPs:** upcoming conferences, workshops, or other events. - **Topic-of-the-day:** This month's topic topic is "NERSC and NERSC Users at SC20". This week and last week NERSC staff and many users have participated in SC20, the largest annual Supercomputing conference. We'll run through some highlights, have an open discussion of what our users found interesting and build up a list of topics and talks of interest to the NERSC user community. - **Coming up:** Nominations and requests for future topics. We're especially interested to hear from our users - what are you using NERSC for, and what are you learning that might be helpful for other NERSC users, and for NERSC? - **Last month's numbers:** NERSC center metrics and info for the most recent month. Please see <> for details. ### (NEW) Expect Invitation to 2020 NERSC User Survey this Week <a name="usersurvey"/></a> NERSC is kicking off its allocation year 2020 user survey this week. We are once again retaining an external company, National Business Research Institute, to perform the survey. Expect an announcement, followed by a personalized invitation to take the survey, this week. ([back to top](#top)) --- ## Updates at NERSC <a name="section3"/></a> ## ### Final Full-Facility Power Outage December 15-20 <a name="powerupgrade"/></a> The final power upgrade for the Perlmutter installation will take place December 15-20. During most of this time, power will be cut to the building where NERSC is housed. **You can expect that for the duration of the outage, NERSC resources will not be available.** More details of the plan will be provided in a standalone email to users and subsequent weekly email items. ### Update on cscratch1 Issues in September/October <a name="cscratchupdate"/></a> We are happy to announce that the root cause of the cscratch1 crash in late September, that caused an extended outage on Cori, has been identified and a fix has been successfully tested. Two separate bugs were identified: one in Lustre that caused the crash itself, and one in a Lustre utility that prevented a fast recovery from the crash. HPE has provided fixes for both of these, which we have been been testing on an isolated, secondary metadata server for over a week now. It will take some weeks to robustly integrate the fixes into Lustre and test and deploy the update across cscratch1. In the meantime the mitigations already in place are still effective: when using Lustre file striping to improve performance of large scale I/O, please limit the stripe count to 72 (the setting provided by the `stripe_large` utility). For more about Lustre striping please see <>. ### New PE Installations for this Week's Maintenance <a name="novpe"/></a> During this week's Cori maintenance, NERSC will install some new software and remove some old software. **Software defaults will remain the same.** We will install the 20.10 Cray Programming Environment (PE) Software release and retire the 20.03 PE. In addition, we will also install the new version of the Intel compiler (which is the release version 2020 update 2). For more information about specific software versions being added and removed, please see <>. ### Try Out the New NERSC Help Portal! <a name="helpportal"/></a> NERSC has launched a new help portal, featuring a redesigned "open ticket" form and quick access to other types of requests. Available at <>, the new portal also includes - Search: view requests and tickets visible to you from the past year; - Recent tickets: lists all your tickets from the last 3 months; - Project tickets: lists open tickets shared with at least one of your NERSC projects; - Watchlist: lists open tickets for which you are on the watchlist. After a period of testing, this new interface will become the default for <> (but the "classic view" will continue to remain available). ### Test out NERSC's New, Filesystem-Like HPSS Interface! <a name="hpss"/></a> We've deployed an experimental interface for HPSS called GHI, which offers a more familiar file system interface for HPSS. You can use GHI to archive entire directory trees or large files without having to worry about bundling files with htar; the system automatically moves files to a special instance of the HPSS archive dedicated to GHI in the optimal tape-friendly configuration for you. Documentation for GHI is available at <> Learn more by viewing this video demo of the system: <> This is still an experimental system, so don't put in any unique data. If you are interested in trying it out, please [open a ticket]( and we'll give you access. ([back to top](#top)) --- ## Calls for Participation <a name="section4"/></a> ## ### Call for Participation: First International Symposium on Checkpointing for Supercomputing (SuperCheck21) <a name="ckpt"/></a> NERSC invites you to participate in the First International Symposium on Checkpointing for Supercomputing (SuperCheck21), which will be held February 4-5, 2021, online. The Call for Participation is now open. We invite researchers, end-users, professionals, and students to participate by submitting an abstract. Topics of interest include (but are not limited to): - Checkpoint/Restart (C/R) research and tools development - C/R targeting the full range of supercomputing software - Pure and hybrid approaches to transparent checkpointing - Development of new methods for low-overhead checkpointing, new algorithms, software development methods, impact of future hardware, performance evaluation, reproducibility, fault recovery - C/R scheduling and intervals - C/R use in production, including all levels of checkpointing (application, job, and system levels) - Adoption of transparent C/R tools in production workloads - Application-initiated use of C/R tools - C/R applications and support on HPC systems For more information and to submit (or to register for the free symposium) please see <>. The deadline for submissions is Monday, December 7. ([back to top](#top)) --- ## Upcoming Training Events <a name="section5"/></a> ## ### (NEW) Join Us for a Training on NVIDIA HPC SDK, December 8 & 10 <a name="nvidiatrain"/></a> NVIDIA will present a two-part training series for NERSC and OLCF users about using OpenMP target offload with NVIDIA's HPC SDK compilers. The training will introduce OpenMP target offload, the NVIDIA compilers, and best practices for achieving high performance with OpenMP target offload on NVIDIA GPUs. Access to Cori GPU nodes will be provided. The trainings will be held on Tuesday and Thursday, December 8 and 10, and presented online only using Zoom. For more information and to register, please see <>. ### Mark Your Calendar for TotalView Training, December 9! <a name="tvtutorial"/></a> NERSC will host a half-day training event on the TotalView debugger on Wednesday, December 9, 2020. In this training, users will learn how to use one of the most popular parallel GUI debugging tools in identifying and fixing errors in parallel codes on CPUs and GPUs. The presenters will also provide the latest updates on TotalView features that can further enhance your debugging experience. The training will be presented online, using Zoom. For the agenda and registration, please see <>. ([back to top](#top)) --- ## NERSC News <a name="section6"/></a> ## ### No New "NERSC User News" Podcast this Week <a name="nopodcast"/></a> There will be no new episode of the "NERSC User News" podcast this week. We encourage you to instead enjoy some of our most recent episodes and greatest hits: - [Software Support Policy]( In this interview with NERSC HPC Consultant Steve Leak, learn about the new NERSC software support policy: what it is, how it works, and its benefits for users and NERSC staff alike. - [NERSC Power Upgrade]( In this interview with Berkeley Lab Infrastructure Modernization Division's David Topete, learn about the power upgrade happening this weekend, the work that has to be done, and the steps taken to ensure the safety of the workers involved in the effort. - [Dynamic fan]( NERSC Energy Efficiency Engineer Norm Bourassa talks about how NERSC is saving energy with the dynamic fan settings on the Cori supercomputing cabinets, and what NERSC is doing to make the cabinets even more energy efficient. - [RAPIDS]( In this interview with NVIDIA RAPIDS senior engineer Nick Becker, learn about the RAPIDS library, how it can accelerate your data science, and how to use it. - [IO Middleware]( NERSC Principal Data Architect Quincey Koziol talks about IO Middleware: what it is, how you can benefit from using it in your code, and how it is evolving to support data-intensive computing and future supercomputing architectures. - [NERSC 2019 in Review and Looking Forward]( NERSC director Sudip Dosanjh reflects upon the accomplishments of NERSC and its users in 2019, and what he's looking forward to in 2020 at NERSC. - [Community File System]( NERSC Storage System Group staff Kristy Kallback-Rose, Greg Butler, and Ravi Cheema talk about the new Community File System and the migration timeline. - [Monitoring System Performance]( NERSC Computational Systems Group's Eric Roman discusses how NERSC monitors system performance, what we're doing with the data right now, and how we plan to use it in the future. - [The Superfacility Concept]( Join NERSC Data Science Engagement Group Lead Debbie Bard in a discussion about the concept of the superfacility: what it means, how facilities interact, and what NERSC and partner experimental facilities are doing to prepare for the future of data-intensive science. - [Optimizing I/O in Applications]( Listen to an I/O optimization success story in this interview with NERSC Data and Analytics Services Group's Jialin Liu. - [NESAP Postdocs]( Learn from NESAP postdoc Laurie Stephey what it's like working as a postdoc in the NESAP program at NERSC. The NERSC User News podcast, produced by the NERSC User Engagement Group, is available at <> and syndicated through iTunes, Google Play, Spotify, and more. Please give it a listen and let us know what you think, via a ticket at <>. ### Come Work for NERSC! <a name="careers"/></a> NERSC currently has several openings for postdocs, system administrators, and more! If you are looking for new opportunities, please consider the following openings: - [NESAP for Data Postdoctoral Fellow]( Work in multidisciplinary teams to transition data-analysis codes to NERSC's new Perlmutter supercomputer and produce mission-relevant science that truly pushes the limits of high-end computing. - [NESAP for Simulations Postdoctoral Fellow]( Work in multidisciplinary teams to develop and optimize codes for the Perlmutter system and produce mission-relevant science that pushes the limits of high-performance computing. - [NESAP for Learning Postdoctoral Fellow]( Work in multidisciplinary teams to develop and implement cutting-edge machine learning/deep learning solutions in codes that will run on NERSC's new Perlmutter supercomputer and produce mission-relevant science that pushes the limits of AI on high-performance computing. - [Systems / DevOps Engineer]( Use your DevOps and system engineering skills to help build and manage systems that complement NERSC's supercomputing environment. (**Note:** We have received reports that the URLs for the jobs change without notice, so if you encounter a page indicating that a job is closed or not found, please check by navigating to <>, scrolling down to the 9th picture that says "All Jobs" and clicking on that. Then, under "Business," select "View More" and scroll down until you find the checkbox for "NE-NERSC" and select it.) We know that NERSC users can make great NERSC employees! We look forward to seeing your application. ### Upcoming Outages <a name="outages"/></a> - **NERSC Center** - 12/15/20 07:00-12/20/20 23:59 PST, Sitewide power upgrade, all NERSC systems unavailable, including Cori, DTNs, Jupyter, HPSS Archive (user), HPSS Regent (backup), ProjectB, Global Homes, NX Services, Science Gateway Services, DNA, Global Common, MongoDB, Globus, Spin, NoMachine, NEWT, Science Databases, JGI db, int, & web servers, MATLAB, Community File System, Iris, ssh-proxy, Multi-Factor Authentication, and R Studio. - **Cori** - 11/18/20 07:00-20:00 PST, Scheduled Maintenance - 01/20/21 07:00-20:00 PDT, Scheduled Maintenance - 02/17/21 07:00-20:00 PDT, Scheduled Maintenance - **DTNs** - 11/18/20 7:00-12:00 PST, Scheduled Maintenance DTNs will be down for OS and GPFS updates. Visit <> for latest status and outage information. ### About this Email <a name="about"/></a> You are receiving this email because you are the owner of an active account at NERSC. _______________________________________________ Users mailing list

