HPC: Meeting the Security Requirements and Computational Needs of Researchers


Erin Shaw and Maureen Dougherty, University of Southern California, June, 2018.

A new research project utilizing a large smart meter dataset from residential electricity customers to understand how urban heat affects residential electrical consumption required high performance computing and a secure computing environment per a strict non-disclosure agreement. ACI-REFs and system administrators at the University of Southern California’s Center for High-Performance Computing worked with the research team to meet the sponsor’s security requirements and the team’s computational needs.

The Objective

University of Southern California faculty members Dr. Kelly Twomey Sanders and Dr. George Ban-Weiss are Assistant Professors of Civil and Environmental Engineering. They had a project titled “Understanding the Role of Urban Climate Variability in Affecting Residential Electricity Consumption”, aimed at assessing the question of how urban heat impacts residential electricity consumption. The proposed research targeted the Los Angeles metropolitan area because it provided the opportunity to observe how variations in urban temperatures impact electricity use across relatively small spatial extents with widely varying microclimates, building stocks, and population vulnerabilities. The research questions made use of high spatiotemporal resolution electricity consumption data, made available recently by California’s investor owned utilities, representing over 20 million customers across the state.

Smart meter data were obtained from Regional Utility Provider (RUP) for approximately 200,000 residential electricity customers. Each customer had approximately 17,520 data records representing two years of annual data. The use of USC’s Center for High-Performance Computing (HPC) cluster was needed because the project required the storage and analysis of approximately 17,520,000,000 data points across the 200,000 customers and the data were under a strict non-disclosure agreement (NDA). HPC administrators had to ensure that the data were transferred securely and remained encrypted at rest and within a secure computing environment at all times.

 The Solution

In December 2016, Drs. Sanders and Ban-Weiss reached out to USC’s Information Technology Services (ITS) Security Director Robert Lau and USC’s ACI-REF Principal Investigator (PI) Maureen Dougherty, to assist them in responding to the NDA that they were working on with the RUP.  They wanted to ensure that HPC ‘s Secure Data Account (HSDA) environment, which was under development at the time, would address both the RUP’s security requirements and the researchers’ computational needs. A third-party security audit consultant was engaged as part of the team to address the NDA requirements and the HSDA workflow.  By collaborative efforts, the ITS team was able to negotiate with RUP and the USC Office of Compliance, and the researchers were able to obtain approval for the NDA.  Shortly after the NDA approval, the RUP presented additional security logistics and questionnaires, requiring new reviews, assistance with responses, and further negotiation on behalf of the researchers. Through coordination of all USC parties by the ACI-REF PI, final approval and data access were obtained in June 2017.  

An initial ACI-REF engagement took place on July 26, 2017, when Dr. Ban-Weiss and Mo Chen, the graduate research fellow on the project, visited HPC’s facility to meet with Erin Shaw and Cesar Sul, who work as USC’s ACI-REF research computing facilitators. HPC’s HSDA environment was in early adoption at the time so the Facilitators pulled in Jimi Chu, the lead computer scientist on the HSDA project. Mo had a previous HPC account and so, despite being added to the HSDA account set up for the new research project, he could not initially login to the encrypted directory. Erin and Cesar identified the cause of problem, which led Erin and Bill Jendrzejek, HPC’s accounts manager, to adjust the account creation workflowand update the account application guide for future HSDA applicants.

During subsequent engagements, the login procedure was tested using SSH via a virtual private network. The secure computing environment required that DUO two-factor authentication be employed for logging in to the secure head node, which is used by researchers to migrate data, test code, and submit jobs to the cluster. Erin and Cesar worked with Mo and Jimi to test the special command line arguments that now had to be passed through DUO for the login to succeed. Following that, Mo came to HPC office hours, which are held each week on USC’s main campus, to work with Erin and Cesar on transferring the data from an external private location to the new secure head node.  At this point, Mo was able to begin performing data analysis using R and Python. Regression models were developed to regress hourly electricity consumption as a function of climate parameters such as temperature, humidity, housing characteristics, precipitation, tree cover, and concrete cover.

The analysis exercised HPC’s queuing system and parallel file system for the first time under HSDA and was watched closely by the ACI-REFs. During this time, any inconsistencies between HPC’s standard and secure computing environments were identified by Mo, confirmed by the ACI-REFs, and explained or adjusted by Jimi and the HPC system administration team. For example, when Mo had trouble running a graphical viewer, ACI-REFs helped debug the X11 forwarding problem, which was not working in the new secure environment. Throughout the project, the ACI-REFs updated the user documentation accordingly.


Example of a home illustrating the stationary point temperature (SPT) and electricity-temperature sensitivity through a segmented linear regression method.

The Result

In January 2018, the researchers submitted their first paper, “The role of household level electricity data in improving estimates of the impacts of climate on building electricity use” by Mo Chen, George A. Ban-Weiss, and Kelly T. Sanders. The outcomes of the study bridge gaps that exist between the energy modeling and climate communities and offer a quantitative understanding of how electricity usage is affected by various climatic parameters. Stakeholders such as utilities, regional transmission operators, and grid operators will benefit from having more insight into future capacity expansion, while the research community will benefit from advancements in the space of urban heat island and energy modeling. The resulting research methods will be applied to other cities in the United States and around the globe in future work.

“This important research would honestly not be possible without the combination of computational power and high security, not to mention technical expertise of the support and research computing staff, of the HPC” –George Ban-Weiss

In summary, this was the first use of HPC’s secure data system and resulted in the fine-tuning of security workflows and the final production workflow. Multiple integrated components were tested, including SSH, DUO authentication, the transfer of files to the encrypted account, secure queue scheduling and resource management, X11 forwarding, and the use of the cluster’s parallel file system. The HSDA environment is now deployed for use and researchers from Gerontology and Preventive Medicine are actively utilizing it for data analysis.. Additionally, this HSDA pilot helped finalize secure data account application and processing procedures, and user documentation. ACI-REFs still coordinate security evaluations with regard to new workflows proposed by our researchers with the ITS security group and our external third-party security auditors.  HPC’s ACI-REFs are actively working with several groups who are exploring HSDA as their computational solution for studies in such areas as brain tumors and complications due to diabetes.

Notable Publications and Presentations Resulting from this Work

Dr. Ban-Weiss received a 2018 National Science Foundation CAREER Award for outstanding work in the classroom and innovative approaches to research.

Collaborators and Resources

Collaborators

  • Maureen Dougherty, Director of High-Performance Computing, USC
  • Dr. Jimi Chu, Systems Administrator, High-Performance Computing, USC
  • Dr. Kelly Twomey Sanders, Assistant Professor, Department of Civil and Environmental Engineering, USC
  • Dr. George Ban Weiss, Assistant Professor, Department of Civil and Environmental Engineering, USC
  • Mo Chen, Graduate Research Fellow, Department of Civil and Environmental Engineering, USC
  • Erin Shaw, Research Computing Facilitator, High-Performance Computing, USC
  • Cesar Sul, Research Computing Facilitator, High-Performance Computing, USC
  • Bill Jendrzejek, Account Administrator, High-Performance Computing, USC
  • Robert Lau, Director, Information Security Systems, USC

Dr. Kelly Twomey Sanders
Civil & Environmental Engineering


Dr. Georg Ban Weiss
Civil & Environmental Engineering


Graduate Student Mo Chen
Civil & Environmental Engineering


ACI-REF Erin Shaw
USC High-Performance Computing


ACI-REF Cesar Sul
USC High-Performance Computing

Resources

All work was performed at USC’s Center for High-Performance Computing. The HPC computing resource consists of two low-latency bandwidth Linux clusters: 457 public and 745 condo’d nodes on a 56-gigabit Infiniband backbone and 109 public and 1372 condo’d nodes on a 10-gigabit Myricom backbone.

Funding Sources

The work described in this case study was supported in part by a grant from the National Science Foundation, Award #1341935, Advanced Cyberinfrastructure – Research and Educational Facilitation: Campus-Based Computational Research Support and in part by USC’s Information Technology Services and Center for High-Performance Computing. The HPC research project described was supported in part by faculty start-up funding and an internal departmental fellowship, and in part by a grant from the National Science Foundation. Southern California Edison provided the dataset.