BITS Pilani, K. K. Birla Goa Campus, India

DaSH Lab (Data, Systems and HPC)

Parallel File System


pfs

  • How do you load balance I/O on to storage servers in a seamless manner?
  • What is an optimal striping layout for a file within an application?
  • Where to place files in a heterogeneous storage setup?
  • How does network and I/O interact with each other in a parallel file system?
  • How does I/O performance differ in parallel file system and object stores?

Collaborators: Virginia Tech - USA, Johannes Gutenberg University Mainz - Germany
BITS Pilani Personnel: Joel Tony, Yash Bhisikar, Bhavya Bajaj, Kaaviya Uthirapandian, Arnav Borkar, Hari Vamsi, Tushar Barman, Sreenath M.
Other Personnel: Debasmita Biswas, Sarah Neuwirth
PFS investigated: Lustre, IBM Spectrum Scale, BeeGFS, Ceph
Few Published Papers: ACM TOS 2024, Cluster 2023 (1), Cluster 2023 (2), INDIS@SC21, Cluster19, IPDPS19


Data Caching


Quam Id Leo

  • How to remove I/O bottleneck for large-scale deep learning applications?
  • How can we use importance sampling for Distributed DL applications?
  • Can I/O bottleneck be reduced for HPC applications by data prefetching?

Collaborators: Oak Ridge National Laboratory - USA, University of Viginia - USA, Virginia Tech - USA, Georgia State University
Other Personnel: Ahmad Maroof Karimi , Redwan Khan, Yue Cheng, Jong Choi, Lipeng Wan
Few Published Papers: FAST23, Cluster22


Resource Scheduling


sched

  • How can distributed deep learning jobs be made GPU heterogenity aware?
  • How can distributed inferencing jobs be made GPU Heterogeneity aware?
  • How can containers be used in HPC environments?

Collaborators: Virginia Tech - USA, Purdue University - USA, IIT Indore, Kuwait University, Northwestern Polytechnical University - China
BITS Pilani Personnel: Aditya Shiva Sharma, Kinshuk Goel, Amey Patil, Mukul Krishnan
Other Personnel: James Davis, Siddharth Sharma, Hadeel Albahar, Nannan Zhao
Few Published Papers: CCGrid22, Cloud20


Supercomputer Log Analysis


sched

  • How can darshan I/O characterization logs be used to characterize ML workloads?
  • How can darshan logs be used for HPC trace generation?
  • How are different supercomputing stacks used by varying HPC workloads?
  • How do DL workloads differ in academia, industry and national labortories?

Collaborators: Virginia Tech - USA, Oak Ridge National Laboratory, Argonne National Laboratory, Lawrence Berkeley Laboratory
BITS Pilani Personnel: Natasha Meena Joseph, S Sai Vineet, Kunal Korgaonkar, Snehanshu Saha
Other Personnel: Ahmad Yazdani,Ahmad Maroof Karimi, Jong Choi, Phil Carns, Bing Xie, Suren Byna, Feiyi Wang, Jean Luca Bez
Few Published Papers: ICDCN23, MASCOTS21, HPDC22, HPDC22



Trust in Serverless


pfs

  • How do you form a framework for FaaS using personal devices?
  • How do you enable trust among the geo-distributed personal devices?
  • How does one schedule FaaS on these devices?
  • How is the cost of running FaaS provider services calculated?

BITS Pilani Personnel: Aalhad Sawane, Pranav Anand, Satyam Bansal, Pranay Varshney, Paras Mittal, Chinmay Rao, Amogh Balepur
Few Published Papers: Poster@HiPC23 (Best Undergraduate Poster Award)


Cross Cloud Compatibility


  • How do you overcome a vendor locking situation with regards to public cloud providers?
  • Can we have a middleware to have cross compatibility among major cloud providers, such as AWS, Azure and GCP?


Blockchain Smart Contracts


sched

  • How does one remove bugs in smart contracts after deployment?
  • Are the upgrade mechanisms given by Ethereum usable for all smart contracts?
  • Which upgrade patterns are most popular and why?

BITS Pilani Personnel: Aishwarya Parab, Nirmal Govindaraj, Siddhant Kulkarni, Pranay Varshney, Kunal Korgaonkar
Few Published Papers: Poster@HiPC23


Container deduplication


sched

  • What are the common layers in containers?
  • Can common layers be deduplicated in large clusters using large amounts of containers?
  • How can deduplication be solved in edge clusters with low resources?
  • Can we devise caching methods for caching updates in edge clusters?

Collaborators: Virginia Tech - USA, Northwestern Polytechnical University - China, Kuwait University, IBM Research - USA, University of Minnesota - USA
BITS Pilani Personnel: Naman Agarwal, Prithvi Vishak, Nakul Bhachawat
Other Personnel: Hadeel Albahar, Nannan Zhao, Ali Anwar
Few Published Papers: ACM TOS24, TPDS20



Systems for Federated Learning (FL)


pfs

  • How can multiple organizations collabate in a data-privacy manner in FL?
  • Can knowledge distillation process be improved in FL?
  • Are all edge devices needed to participate in a FL training process?
  • Can we optimize energy in an FL process?

Collaborators: Queen Mary University of London - UK, Rochester Institute of Technology- USA
BITS Pilani Personnel: Sarang S, Druva Dhakshinamoorthy, Yuvraj Singh Bhadauria, Arihant Bansal, Pinki Yadav, Aditya Shiva Sharma, Sidharth C. Vivek, Vijay Dharmaji, Subroto Majumder, Manit Tanwar
Other Personnel: Ahmed Sayed, M. Mustafa Rafique
Few Published Papers: Poster@HiPC23


MLOps


  • How to integrate automation of fault tolerance in MLOps pipeline?
  • Can optimizations be done for deploying FL workloads using FLOps?
  • Can intelligent Infrastructure-As-Code (IAC) be deployed for MLOps?

BITS Pilani Personnel: Aishwarya Jayashankar, Ayush Bharadwaj


Data Caching in TinyML


  • How can distributed tinyML training be made aware of the heterogeneous edge environment?
  • Can the datasets used for tinyML training be cached given the low resources on the devices?

BITS Pilani Personnel: Advik Raj, Sidharth C. Vivek



Data Compression


pfs

  • Can we have a cost optimization model for running data compression algorithms on the cloud vs the end devices?
  • How do the different optimization parameters of zstd fare against each other?
  • How do large VM images fare with various levels of zstd?

Collaborators: Druva Inc.
BITS Pilani Personnel: Pinki Yadav, Vinayak Naik


DataSpaces


  • DataSpaces is a data management framework for scientific workflows.
  • Can optimizations be done for integrating HPC file systems into dataspaces?
  • How can I/O intensive tasks be optimized in such a workflow?

Collaborators: University of Utah - USA
BITS Pilani Personnel: Joel Tony
Other Personnel: Manish Parashar


I/O Uring


pfs

  • How can kernel I/O optimizations affect different large-scale use-cases?
  • Which aspects of key value stores and databases can benefit from I/O uring optimizationsat the kernel level?

BITS Pilani Personnel: Arnav Gupta, Druva Dakshinamoorthy, Arjun Jagtap