This week at the University of Colorado Research Computing
Tech. Team...
New HPC Storage Admin, Patricia Egeland
Were're bringing on a new team member, Patricia Egeland, as HPC
Storage Administrator, starting Tuesday. Patricia will share general
system administration duties with the other RC tech. team operational
staff; but will carry primary responsibility for RC data storage
systems, notably RC Core Storage and the PetaLibrary. She also plans
to contribute to ongoing and upcoming software development efforts at
RC, and we're looking forward to seeing more interest in that space
both internally and in our user community.
Patricia worked most recently as a systems analyst for the Dark Energy
Survey (DES) Science Portal, and previously worked as a server,
system, and application administrator for the CERN Compact Muon
Solenoid (CMS) experiment.
We couldn't be more pleased to have Patricia as a member of the RC
tech. team! Please join me in welcoming her if you have an opportunity
to work with her.
Final deployment of HPCF UPS
We'll have the final of our three UPS-related HPCF outages starting on
Wednesday, and closing out in the afternoon on Friday. During this
outage we'll be...
re-routing the power cabling between the UPS infrastructure and the
in-row power-distribution infrastructure for future maintainability;
decommissioning the legacy UPS;
installing additional in-row power distribution infrastructure;
and bringing the new UPS into full production.
The new HPCF UPS will provide not only power conditioning (remediating
a power quality issue that has led to several past Summit compute
outages) but also at least 15 minutes of UPS-backed runtime in the
event of a complete utility power outage (which should eventually
provide us sufficient time to power the system off in a controlled
manner).
PetaLibrary/2 RFP goes live
Research Computing successful PetaLibrary service is getting a
refresh! Or, at least, that's the intent. We're publishing an RFP
today for a new, unified infrastructure, which should extend the life
of the PetaLibrary, simplify our service offerings, make the
infrastructure more maintainable, and eventually allow us to add
additional features and services.
- Monday, 31 July 2017
-
RFP Posted online
- Friday, 4 August 2017 (09:00)
-
Optional Pre-bid call
- Friday, 11 August 2017
-
written questions are due
- Tuesday, 5 September
-
RFP responses are due
https://bids.sciquest.com/apps/Router/PublicEvent?CustomerOrg=Colorado
XSEDE SSO hub authentication progress
RMACC Summit is intended, as its full name implies, to be an RMACC
resource, not just a CU or CSU resource. We've planned from the
beginning to support access to Summit through XSEDE credentials, but
this has required additional (though already planned) service
development at XSEDE. Those services are ready for beta testing now,
and CU is on hand as an early adopter for their new "single sign-on
hub for L3 service providers" service (XCI-36). We're working on
deploying this now, and will hopefully be able to start bringing on
early-adopters from the RMACC community soon.
Misc. other things
We're rebuilding the RC login environment. We've been through a few
prototype efforts, but the current plan is to start by deploying a
new tutorial login node, tlogin1
, which will also be the first
recipient of new XSEDE authentication services.
We're continuing to develop our internal "curc-bench" automated
benchmarking utility for validating the performance of RC HPC
resources over time (notably after we make changes). Development is
primarily driven by Aaron Holt.
We had to rebuild Sneffels (originally "the viz cluster") after a
security incident. That work is largely done, and service has been
restored, but OIT is sill reviewing viz1
as part of our incident
response process.
We're updating the Globus software for our data-transfer service,
starting with dtn01
. We're further taking this opportunity to
re-build our DTN configuration in general, which should lead to
better and more reliable data-transfer performance due to the
correction of a number of networking irregularities on these
servers. This work is being done primarily by Dan Milroy.