Duke Physics Department Crunches Data with JetStor RAID Arrays
Scientific research can produce huge volumes of data for analysis, a fact the Physics Department at Duke University knows well. Each of the department’s 12-plus research groups exploring nuclear physics, particle physics, and other exotic fields generates a terabyte of data every month.
To meet the storage challenge this data presented, the research groups used direct-attached storage. However, this approach limited storage to the size of each computer’s hard drive and impeded data sharing among researchers. It also barely kept pace with the intense data bursts generated by the groups’ experiments.
Read more below about how Duke’s Physics Department deployed high-speed JetStor RAID Arrays to absorb “bursty” data generation and enable the storing and sharing of large data sets.
JetStor® RAID Arrays Support Cutting-Edge Physics Research at Duke University
At the Duke University Physics Department in Durham, North Carolina, some 600 faculty, graduate students, postdocs, researchers, and visiting scholars work in such fields as nuclear physics, condensed matter physics, high energy physics, photon physics, and quantum optics. The department has 1,500 computers and workstations in facilities like the new 280,000 square-foot French Family Science Center, which has laboratories for genomics, biological chemistry, materials science, nanoscience, physical biology, and bioinformatics.
Research at Duke’s physics department, generates vast amounts of data that must be stored for analysis and peer review. Not all the data are quantitative, however. Research groups, such as the biophysics workgroup investigating the dynamics of cardiac muscles, often rely on high-speed cameras that produce huge quantities of images very rapidly. Just one research group can create a terabyte of data each month and the Duke Physics Department has over 12 such groups. Additionally, unlike at commercial enterprises that often produce somewhat consistent levels of data over the course of weeks, the data generated at the physics department is “bursty.” Sophisticated, often custom-built applications may rest dormant for periods and then rapidly churn out data for the duration of experiments.
The Duke Physics Department initially met its substantial storage needs by using direct-attached storage (DAS), in which large hard drives were directly linked to computers without a network in between. Stored data, however, were accessible only from the attached computer and the amounts of data preserved were limited by the capacities of each hard drive. Although investigative science is highly collaborative, researchers were unable to easily share data, which impeded analyses and ensuing discoveries.
“We required a more advanced storage strategy, one worthy of the science we conduct,” said Jimmy D., senior IT manager for the Duke University Physics Department. “We needed solutions that are scalable and fast enough to ingest large troves of data very quickly.”
Ten JetStor SAS 516iS 16-bay iSCSI RAID Arrays and JetStor SATA 416iS 16-bay iSCSI RAID Arrays with 2 Tb disks from Advanced Computer & Network Corporation (AC&NC).
• JetStor SAS 516iS iSCSI RAID Arrays with gigabit iSCSI links to a Dell Powerconnect 5424 switch
• JetStor SATA 416iS iSCSI RAID Arrays with gigabit iSCSI links to a Dell Powerconnect 5424 switch
• Dell iSCSI Powerconnect 5424 Optimized Switch
BENEFITS IMMEDIATELY REALIZED
By clustering ten JetStor RAID Arrays into an iSCSI storage area network (SAN), the Duke Physics Department gained a storage infrastructure that fully supports its ongoing scientific research. The theoretical nuclear physics group, for example, stores its experimental data on five JetStor platforms, and another group using high-speed cameras to investigate particle flows saves some 20,000 high-resolution images on JetStor solutions. Physicists working with national laboratories like Fermilab and Brookhaven deploy the devices to house large data sets locally to expedite analyses of research results.
“We built a robust SAN without the costs and complexities of Fibre Channel by using the iSCSI connectivity of the JetStors,” said Jimmy. “Because data is striped across multiple disks in each array, the JetStors offer the throughput demanded by even the very bursty data production of our lab work.”
The physics department also relies on the JetStor arrays to back up Linux servers and a Mac OS server that uses Apple’s Time Machine application to mirror data on Mac laptops. Administrators even use the JetStor systems to support the department’s web site. “We can allocate storage to workgroups as needed and add capacity without disrupting the production environment,” added Jimmy. “Our JetStors also ensure no data is lost, which is vital because repeating experiments to reacquire lost data is expensive and time consuming. Our physicists can now perform the most rigorous lab work with confidence that storage will never be a bottleneck or impediment.”
HOW WE DID IT
To build its storage environment, the Duke Physics Department attached its JetStor RAID Arrays to a Dell iSCSI Powerconnect 5424 Optimized Switch using iSCSI Gigabit Ethernet links. The switch connects with the same bandwidth to a variety of servers, mostly Dell and Sun with one Mac system, and to the department’s 10 Gigabit Ethernet production network. The JetStors platforms are configured for RAID 6, which delivers block-level data striping and avoids data loss even should two disks fail within an array.
The department provisions them with 2 terabyte disks to attain over a 100 terabytes of storage capacity. “Between the bandwidth on our network and the throughput of our JetStors, we can support extreme data generation,” said Jimmy.“When a workgroup is created or a data-intensive experiment is conducted, we can ensure that researchers have access to fast, reliable storage.”
Administrators use JetStor RAID Manager, a web-based application, to manage the storage systems. They can easily start or shut any array with readily-accessible controls on the devices, and can quickly identify any disk within an array that might be malfunctioning. “This enables us to quickly adapt storage to our very fluid research environment,” Jimmy concluded.