Close this window

Email Announcement Archive

[Users] NERSC Weekly Email, Week of July 4, 2022

Author: Rebecca Hartman-Baker <rjhartmanbaker_at_lbl.gov>
Date: 2022-07-04 07:47:33

# NERSC Weekly Email, Week of July 4, 2022<a name="top"></a> # ## Contents ## - [Summary of Upcoming Events and Key Dates](#dates) ## [NERSC Status](#section1) ## - [NERSC Operations Continue as Berkeley Lab Reopens, with Minimal Changes](#curtailment) ## [This Week's Events and Deadlines](#section2) ## - [Independence Day Holiday Today; No Consulting or Account Support](#indday) - [IDEAS-ECP Webinar on "Growing preCICE from an as-is Coupling Library to a Sustainable, Batteries-included Ecosystem" on Wednesday](#ecpwebinar2) - [Tutorial on Coordinating Dynamic Ensembles of Computations with libEnsemble on Thursday](#libensemble) - [(NEW) Jupyter Maintenance on Thursday, 8-9 am Pacific Time](#jupytermaint) ## [Perlmutter](#section3) ## - [Perlmutter Machine Status](#perlmutter) - [Integration of Perlmutter Phase 2 Will Minimize System Downtime](#pmintegration) - ["Preempt" Queue Available on Perlmutter Nodes](#pmpreempt) - [CPU-Only Nodes Available on Perlmutter; Try Them Out Free of Charge!](#pmcpu) - [(NEW) Jupyter Users: Please Try Jupyter on Perlmutter](#pmjupyter) ## [Updates at NERSC ](#section4) ## - [Annoucing the NUG SIG for WRF Users at NERSC](#nugwrf) - [Many Counters Used by Performance Tools Temporarily Disabled; NERSC Awaiting Vendor Patch Before Re-enabling](#vtune) ## [Calls for Participation](#section5) ## - [(NEW) Call for Papers: 9th Workshop on Accelerator Programming Using Directives (WACCPD) at SC22](#waccpd) - [Call for Participation: Third International Symposium on Checkpointing for Supercomputing](#supercheck) ## [Upcoming Training Events ](#section6) ## - [(NEW) Introduction to HIP Programming, July 14](#introhip) - [(NEW) HIP for CUDA Programmers, July 21](#hip4cuda) - [(NEW) Register Today for August 25 E4S at NERSC Training Event!](#e4snersc) - [Learn to Use Spin to Build Science Gateways at NERSC: Next SpinUp Workshop Starts August 10!](#spinup) ## [NERSC News ](#section7) ## - [Come Work for NERSC!](#careers) - [Upcoming Outages](#outages) - [About this Email](#about) ## Summary of Upcoming Events and Key Dates <a name="dates"/></a> ## July 2022 Su Mo Tu We Th Fr Sa 1 2 3 *4* 5 *6* *7* 8 9 4 Jul Independence Day Holiday [1] 6 Jul IDEAS-ECP Monthly Webinar [2] 7 Jul Python libEnsemble Tutorial [3] 7 Jul Jupyter Maintenance [4] 10 11 12 13 *14* 15 16 14 Jul Intro to HIP Training [5] 17 18 19 *20* 21 22 23 20 Jul Cori Monthly Maintenance [6] 21 Jul HIP for Cuda Prog Training [7] 24 25 26 *27* 28 29 30 27 Jul Intro to HDF5 Training [8] 31 August 2022 Su Mo Tu We Th Fr Sa 1 2 3 4 *5* 6 5 Aug WACCPD Submissions Due [9] 7 8 9 *10* 11 12 13 10 Aug SpinUp Workshop [10] 14 15 16 *17* 18 19 20 17 Aug Cori Monthly Maintenance [6] 21 22 23 24 *25**26* 27 25 Aug E4S at NERSC Training [11] 25-26 Aug AI for Science Bootcamp [12] 26 Aug SuperCheck-SC22 Subs Due [13] 28 29 30 31 September 2022 Su Mo Tu We Th Fr Sa 1 2 3 4 *5* 6 7 8 9 10 5 Sep Labor Day Holiday [14] 11 12 13 14 15 16 17 18 19 20 *21* 22 23 24 21 Sep Cori Monthly Maintenance [6] 25 26 27 28 29 30 1. **July 4, 2022**: [Independence Day Holiday](#indday) (No Consulting or Account Support) 2. **July 6, 2022**: [IDEAS-ECP Monthly Webinar](#ecpwebinar2) 3. **July 7, 2022**: [Python libEnsemble Tutorial](#libensemble) 4. **July 7, 2022**: [Jupyter Maintenance](#jupytermaint) 5. **July 14, 2022**: [Introduction to HIP Training](#introhip) 6. **July 20, August 17, & September 21, 2022**: Cori Monthly Maintenance 7. **July 21, 2022**: [HIP for CUDA Programmers Training](#hip4cuda) 8. **July 27, 2022**: [Intro to HDF5 Training](#introhdf5) 9. **August 5, 2022**: [WACCPD Submissions Due](#waccpd) 10. **August 10, 2022**: [SpinUp Workshop](#spinup) 11. **August 25, 2022**: [E4S at NERSC Training](#e4snersc) 12. **August 25-26, 2022**: [AI for Science Bootcamp](#ai4sci) 13. **August 26, 2022**: [Submissions due for SuperCheck-SC22](#supercheck) 14. **September 5, 2022**: Labor Day Holiday (No Consulting or Account Support) 15. All times are **Pacific Time zone** - **Upcoming Planned Outage Dates** (see [Outages section](#outages) for more details) - **Thursday**: Jupyter - **Other Significant Dates** - **August 3-4, 2022**: OpenACC and Hackathons 2022 Summit - **October 5 & November 30, 2022**: SpinUp Workshops - **October 19 & November 16, 2022**: Cori Monthly Maintenance Window - **November 14, 2022**: [SuperCheck-SC22 Workshop](https://supercheck.lbl.gov) - **November 24-25, 2022**: Thanksgiving Holiday (No Consulting or Account Support) - **December 23, 2022-January 2, 2023**: Winter Shutdown (Limited Consulting and Account Support) ([back to top](#top)) --- ## NERSC Status <a name="section1"/></a> ## ### NERSC Operations Continue as Berkeley Lab Reopens, with Minimal Changes <a name="curtailment"/></a> Berkeley Lab, where NERSC is located, is beginning to welcome employees back on-site following a two-year absence. NERSC remains in operation, with the majority of NERSC staff continuing to work remotely, and staff essential to operations onsite. We do not expect any disruptions to our operations in the next few months as the site reopens. You can continue to expect regular online consulting and account support as well as schedulable online appointments. Trainings continue to be held online. Regular maintenances on the systems continue to be performed while minimizing onsite staff presence, which could result in longer downtimes than would occur under normal circumstances. Because onsite staffing remains minimal, we request that you continue to refrain from calling NERSC Operations except to report urgent system issues. For **current NERSC systems status**, please see the online [MOTD](https://www.nersc.gov/live-status/motd/) and [current known issues](https://docs.nersc.gov/current/) webpages. ([back to top](#top)) --- ## This Week's Events and Deadlines <a name="section2"/></a> ## ### Independence Day Holiday Today; No Consulting or Account Support <a name="indday"/></a> Consulting and account support will be unavailable today, Monday, July 4 due to the Berkeley Lab-observed Indepenence Day holiday. Regular consulting and account support will resume tomorrow. ### IDEAS-ECP Webinar on "Growing preCICE from an as-is Coupling Library to a Sustainable, Batteries-included Ecosystem" on Wednesday <a name="ecpwebinar2"/></a> The May webinar in the [Best Practices for HPC Software Developers](http://ideas-productivity.org/events/hpc-best-practices-webinars/) series is entitled "Growing preCICE from an as-is Coupling Library to a Sustainable, Batteries-included Ecosystem" and will take place this **Wednesday, July 6, at 10:00am Pacific Time**. In this webinar, Gerasimos Chourdakis (Technical University of Munich) will talk about how the preCICE library grew from a humble coupling library for fluid interaction problems used by just a few academic groups in Germany to a complete coupling ecosystem used by more than a hundred research groups worldwide for a wide range of multi-physics applications. This required more than just simple software changes; effective documentation and community-building practices were imperative. The webinar focuses on lessons learned that can help any research software project grow in a sustainable way. There is no cost to attend, but registration is required. Please register at <https://www.exascaleproject.org/event/precice-ecosystem/>. ### Tutorial on Coordinating Dynamic Ensembles of Computations with libEnsemble on Thursday <a name="libensemble"/></a> Are you running large numbers of computations to train models, perform optimizations based on simulation results, or perform other adaptive parameter studies? If so, consider registering for the upcoming tutorial on libEnsemble, a Python toolkit for coordinating asynchronous and dynamic ensembles of calculations across massively parallel resources. The tutorial, which will be held from 10 am to 11:30 am (Pacific time) this Thursday, July 7, will address how to copule libEnsemble workflows with any user application and apply advanced features including the allocation of variable resources and the cancellation of simulations based on intermediate outputs. Using examples from current ECP software technology and application integrations, the presenters will demonstrate how libEnsemble's mix-and-match approach can help interface libraries and applications with exascale-level resources. For more information and to register, please see <https://www.exascaleproject.org/event/libensemble_jul2022/>. ### (NEW) Jupyter Maintenance on Thursday, 8-9 am Pacific Time <a name="jupytermaint"/></a> NERSC's Jupyter service will be upgraded Thursday, July 7 between 8-9 AM PDT. This is planned to be a minimally disruptive maintenance: Notebook servers that are running at the time of the upgrade should continue to run. During the maintenance Jupyter will be marked as degraded, but after it is over users should be able to restart notebook servers and "pick up" improvements on their own schedule. The maintenance should address the [dask-labextension pop-up issue](https://github.com/dask/dask-labextension/issues/226), hopefully reduce the volume of [configuration loaded messages from Jupytext](https://github.com/mwouts/jupytext/issues/959) in user notebook server logs, and fix a recent issue with the Jupyter HDF5 plug-in that rendered it inoperable. ([back to top](#top)) --- ## Perlmutter <a name="section3"/></a> ## ### Perlmutter Machine Status <a name="perlmutter"/></a> The initial phase of the Perlmutter supercomputer is in the NERSC machine room, running user jobs. Some nodes of the CPU-only second phase of the machine have been added to the machine. NERSC has now added all users to Perlmutter. Everyone is welcome to try out the GPU-accelerated nodes and the new CPU-only nodes. The walltime limit for jobs on Perlmutter has been raised from 6 hours to 12 hours. This newsletter section will be updated each week with the latest Perlmutter status. ### Integration of Perlmutter Phase 2 Will Minimize System Downtime <a name="pmintegration"/></a> All cabinets for the second phase of Perlmutter have arrived. The Phase 2 nodes need to be added to the system, and this integration process has been designed to minimize system downtime for users. You may see the number of nodes available continue to fluctuate, but we expect to be able to keep at least 500 Phase-1 nodes available to users throughout the process. While we will keep Perlmutter available as much as possible, it is not yet a production system so there are no uptime guarantees. ### "Preempt" Queue Available on Perlmutter Nodes <a name="pmpreempt"/></a> A "preempt" queue is available on the Perlmutter system for jobs running on GPU or CPU nodes. This queue is aimed at users whose jobs are capable of running for a relatively short amount of time before terminating. For example, if your code is able to checkpoint and restart where it left off, you may be interested in the preempt queue. The preempt queue is accessed by adding `-q preempt` in your job script. Jobs in this queue may specify a walltime up to 24 hours (vs. the current max walltime of 12 hours), but are subject to preemption after 2 hours. Additionally, the maximum number of nodes requested must not exceed 128. For an example preemptible job script, please see our documentation pages: <https://docs.nersc.gov/jobs/examples/#preemptible-jobs>. ### CPU-Only Nodes Available on Perlmutter; Try Them Out Free of Charge! <a name="pmcpu"/></a> The second phase of Perlmutter nodes contain only CPUs. Approximately 1500 nodes are currently available for NERSC users. We invite you to try out the CPU nodes with your workflows. The nodes are very similar to the CPU nodes on Cori but NERSC provides specific information on [compiling](https://docs.nersc.gov/development/compilers/) and [running jobs](https://docs.nersc.gov/systems/perlmutter/running-jobs/) in our documentation. Like the Perlmutter GPU nodes, there is currently **no charge** to use the CPU nodes. Please remember to use your CPU allocation account (*without* trailing `_g`) when running on the Perlmutter CPU nodes. ### (NEW) Jupyter Users: Please Try Jupyter on Perlmutter <a name="pmjupyter"/></a> In June, more than 1400 unique users, excluding staff, used Jupyter on Cori, and on any given day about 400 notebook servers are running on Cori's 4 dedicated Jupyter nodes. Summer brings a surge of Jupyter usage, and this summer we have once again broken our record in terms of users per month. We're working on configuration changes to try to address performance/resilience issues; we may have more about that next week. But in contrast to Cori, the use of Jupyter on Perlmutter is less than 100 notebook servers at a time. And on Perlmutter, Jupyter notebook sessions are allowed to start up on any one of its 40 login nodes (it's not limited to just a few nodes like on Cori). In addition, users can use Perlmutter's considerably more powerful CPU and GPU compute nodes for Jupyter work --- currently free of charge. You may need to move data from Cori scratch to Perlmutter scratch to get started, but NERSC has you covered with [Globus](https://docs.nersc.gov/services/globus/). And of course if you have any problems migrating to Perlmutter with moving data or setting up your environment, just reach out to us via [ticket](https://help.nersc.gov) so we can help. Make the switch sooner rather than later! ([back to top](#top)) --- ## Updates at NERSC <a name="section4"/></a> ## ### Annoucing the NUG SIG for WRF Users at NERSC <a name="nugwrf"/></a> NERSC users come from many institutions and do diverse research, but can share common challenges and best practices. The new special interest group for the Weather Research and Forecasting (WRF) model users at NERSC is a forum for participants to share compilation scripts, data, and tips for using WRF at NERSC. It meets via Zoom and uses the [`#wrf_user` channel at NUG Slack](https://www.nersc.gov/users/NUG/nersc-users-slack/) for discussion. If you use WRF at NERSC we invite you to join the WRF Users at NERSC SIG via that slack channel or [this sign-up sheet](https://forms.gle/vts4AYvtKuzju9qn7). ### Many Counters Used by Performance Tools Temporarily Disabled; NERSC Awaiting Vendor Patch Before Re-enabling <a name="vtune"/></a> Many of the counters used by performance tools have been temporarily disabled on NERSC resources to mitigate a security vulnerability. This could impact users of many performance tools, including VTune, CrayPat, PAPI, Nsight System (CPU metrics only), HPCToolkit, MAP, and more. NERSC is awaiting a patch from the vendor before re-enabling these counters. We will let you know when this issue has been resolved. ([back to top](#top)) --- ## Calls for Participation <a name="section5"/></a> ## ### (NEW) Call for Papers: 9th Workshop on Accelerator Programming Using Directives (WACCPD) at SC22 <a name="waccpd"/></a> The Call for Papers for the 9th Workshop on Accelerator Programming Using Directives is now open! The workshop aims to showcase all aspects of accelerator programming for heterogeneous systems such as innovative high-level language or library approaches, lessons learned while using directives or other portable approaches to migrate scientific legacy code to modern systems, and compilation and runtime scheduling techniques. The paper submission deadline is August 5, 2022 AOE. For more information, see <https://www.waccpd.org/>. ### Call for Participation: Third International Symposium on Checkpointing for Supercomputing <a name="supercheck"/></a> You are invited to participate in the Third International Symposium on Checkpointing for Supercomputing (SuperCheck-SC22), which will be held on November 14, 2022, in conjunction with SC22. The workshop will feature the latest work in checkpoint/restart research, tools development, and production use. Topics of interest for the workshop include but are not limited to: - Application-level checkpointing: APIs to define critical states, techniques to capture critical states (e.g., efficient serialization) - Transparent/system-level checkpointing: techniques to capture state of devices and accelerators (CPUs, GPUs, network interfaces, etc.) - I/O and storage solutions that leverage heterogeneous storage to persist checkpoints at scale - Checkpoint size-reduction techniques (compression, deduplication) - Alternative techniques that avoid persisting checkpoints to storage (e.g., erasure coding) - Synchronous vs. asynchronous checkpointing strategies - Multi-level and hybrid strategies combining application-level, system-level, transparent checkpointing on heterogeneous hardware - Application-specific techniques combined with checkpointing (e.g., ABFT) - Performance evaluation and reproducibility, study of real failures and their recovery - Research on optimal checkpointing interval, C/R-aware job scheduling and resource management - Experience with traditional use cases of checkpointing on novel platforms - New use cases of checkpointing beyond resilience - Support on HPC systems (e.g., resource scheduling, system utilization, batch system integration, best practices, etc.) The call for participation is available at: <https://supercheck.lbl.gov/call-for-participation>. Submissions are due **August 26, 2022.** ([back to top](#top)) --- ## Upcoming Training Events <a name="section6"/></a> ## ### (NEW) Introduction to HIP Programming, July 14 <a name="introhip"/></a> OLCF is offering a training on "Introduction to HIP Programming" next Thursday, July 14. The training is open to NERSC users. NERSC users who are not OLCF users will not be able to participate in the hands-on exercises, as HIP is not yet available on Perlmutter, but are welcome to watch the presentations. HIP is a C++ runtime API that allows developers to write portable code that can be run on GPUs from AMD (such as those on OLCF's Frontier) and NVIDIA (such as those on Perlmutter). HIP is a portability layer (or wrapper) that uses the underlying platform installed on the system, and is meant to have little to no performance impact compared with coding directly for ROCm or CUDA. For more information and to register, please see <https://www.nersc.gov/users/training/events/introduction-to-hip-programming-july-14-2022/>. ### (NEW) HIP for CUDA Programmers, July 21 <a name="hip4cuda"/></a> OLCF's HIP training series continues on July 21 with a training aimed at those with experience programming for CUDA who want to learn to "hipify" their codes. The training is open to NERSC users, but those who are not existing OLCF users will not be able to participate in the hands-on sessions. In this tutorial, you will learn how to get started with "hipify"-ing existing CUDA code on Summit so it can run on both NVIDIA (e.g., Summit, Perlmutter) and ROCm (e.g., Frontier) platforms. The session will consist of a main presentation with hands-on exercises throughout. For more information and to register, please see <https://www.nersc.gov/users/training/events/hip-for-cuda-programmers-july-21-2022/>. ### (NEW) Register Today for August 25 E4S at NERSC Training Event! <a name="e4snersc"/></a> NERSC is hosting a half-day training on the Extreme-Scale Scientific Software Stack (E4S), a curated collection of open-source scientific software packages deployed via Spack, on August 25, 2022. NERSC has [multiple deployments of E4S](https://docs.nersc.gov/applications/e4s/) available for users on Perlmutter and Cori, and we find it an invaluable resource for anyone needing to satisfy software dependencies to compile and run their scientific workflows. The session will include talks from ECP leadership and hands-on sessions on using Spack to deploy a mini software stack from E4S, as well as an inside look at the E4S deployment process at NERSC and the steps required to deploy E4S and release it to the general public. The event will be accessible online **or** in-person at NERSC. For more information and to register, please see <https://www.nersc.gov/users/training/events/e4s-at-nersc-2022/>. ### Learn to Use Spin to Build Science Gateways at NERSC: Next SpinUp Workshop Starts August 10! <a name="spinup"/></a> Spin is a service platform at NERSC based on Docker container technology. It can be used to deploy science gateways, workflow managers, databases, and all sorts of other services that can access NERSC systems and storage on the back end. New large-memory nodes have been added to the platform, increasing the potential of the platform for new memory-constrained applications. To learn more about how Spin works and what it can do, please listen to the NERSC User News podcast on Spin: <https://anchor.fm/nersc-news/episodes/Spin--Interview-with-Cory-Snavely-and-Val-Hendrix-e1pa7p>. Attend an upcoming SpinUp workshop to learn to use Spin for your own science gateway projects! Applications for sessions that begin [Wednesday, August 10](https://www.nersc.gov/users/training/spin/) are now open. SpinUp is hands-on and interactive, so space is limited. Participants will attend an instructional session and a hack-a-thon to learn about the platform, create running services, and learn maintenance and troubleshooting techniques. Local and remote participants are welcome. If you can't make these upcoming sessions, never fear! The next session begins October 5, and more are planned for November and next year. See a video of Spin in action at the [Spin documentation](https://docs.nersc.gov/services/spin/) page. ([back to top](#top)) --- ## NERSC News <a name="section7"/></a> ## ### Come Work for NERSC! <a name="careers"/></a> NERSC currently has several openings for postdocs, system administrators, and more! If you are looking for new opportunities, please consider the following openings: - [Scientific Data Architect](http://m.rfer.us/LBL7BZ58O): Support a high-performing data and AI software stack for NERSC users, and collaborate on multidisciplinary, cross-institution scientific projects with scientists and instruments from around the world. - [HPC Architecture and Performance Engineer](http://m.rfer.us/LBL1rb56n): Contribute to NERSC's understanding of future systems (compute, storage, and more) by evaluating their efficacy across leading-edge DOE Office of Science application codes. - [Technical and User Support Engineer](http://m.rfer.us/LBLPYs4pz): Assist users with account setup, login issues, project membership, and other requests. - [NESAP for Simulations Postdoctoral Fellow](http://m.rfer.us/LBLRUa4lS): Collaborate with computational and domain scientists to enable extreme-scale scientific simulations on NERSC's Perlmutter supercomputer. - [Cyber Security Engineer](http://m.rfer.us/LBLa_B4hg): Join the team to help protect NERSC resources from malicious and unauthorized activity. - [NESAP for Data Postdoctoral Fellow](http://m.rfer.us/LBLXEt4g5): Collaborate with computational and domain scientists to enable extreme-scale scientific data analysis on NERSC's Perlmutter supercomputer. - [Machine Learning Postdoctoral Fellow](http://m.rfer.us/LBL2sf4cR): Collaborate with computational and domain scientists to enable machine learning at scale on NERSC's Perlmutter supercomputer. - [HPC Performance Engineer](http://m.rfer.us/LBLsGT43z): Join a multidisciplinary team of computational and domain scientists to speed up scientific codes on cutting-edge computing architectures. (**Note:** You can browse all our job openings on the [NERSC Careers](https://lbl.referrals.selectminds.com/page/nersc-careers-85) page, and all Berkeley Lab jobs at <https://jobs.lbl.gov>.) We know that NERSC users can make great NERSC employees! We look forward to seeing your application. ### Upcoming Outages <a name="outages"/></a> - **Cori** - 07/20/22 07:00-20:00 PDT, Scheduled Maintenance - 08/17/22 07:00-20:00 PDT, Scheduled Maintenance - 09/21/22 07:00-20:00 PDT, Scheduled Maintenance - **Jupyter** - 07/07/22 08:00-09:00 PDT, Scheduled Maintenance Visit <http://my.nersc.gov/> for latest status and outage information. ### About this Email <a name="about"/></a> You are receiving this email because you are the owner of an active account at NERSC. This mailing list is automatically populated with the email addresses associated with active NERSC accounts. In order to remove yourself from this mailing list, you must close your account, which can be done by emailing <accounts@nersc.gov> with your request. _______________________________________________ Users mailing list Users@nersc.gov

Close this window