List of publications

Rittenbach, A., Imes, C., & Walters, J. P. (2024). Timely Wildfire Perimeter Mapping for Unmanned Aerial Platforms. Applied Industrial Spectroscopy, FD1–7.
Imes, C., Rittenbach, A., Xie, P., Kang, D. I. D., Walters, J. P., & Crago, S. P. (2024). Evaluating Deep Learning Recommendation Model Training Scalability with the Dynamic Opera Network. Proceedings of the 4th Workshop on Machine Learning and Systems, 169–175.
Wang, H., Imes, C., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2023). Quantpipe: Applying adaptive post-training quantization for distributed transformer pipelines in dynamic edge environments. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5.
Imes, C., King, D. W., & Walters, J. P. (2023). Distributed Edge Machine Learning Pipeline Scheduling with Reverse Auctions. 2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC), 196–203.
Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2022). Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices. 2022 25th Euromicro Conference on Digital System Design (DSD), 298–307.
Rittenbach, A., & Walters, J. P. (2021). Demonstration of a fully neural network based synthetic aperture radar processing pipeline for image formation and analysis. Sensors, Systems, and Next-Generation Satellites XXV, 11858, 98–109.
Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. N. (2021). Pipeline parallelism for inference on heterogeneous edge computing. ArXiv Preprint ArXiv:2110.14895.
Imes, C., Li, T.-M., Glines, M., Khan, R., & Walters, J. P. (2021). Distributed and Heterogeneous SAR Backprojection with Halide. 2021 IEEE High Performance Extreme Computing Conference (HPEC), 1–9.
Rittenbach, A., & Walters, J. P. (2020). RDAnet: A deep learning based approach for synthetic aperture radar image formation. ArXiv Preprint ArXiv:2001.08202.
Imes, C., Hofmeyr, S., Kang, D. I. D., & Walters, J. P. (2020). A case study and characterization of a many-socket, multi-tier NUMA HPC platform. 2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar), 74–84.
Imes, C., Colin, A., Zhang, N., Srivastava, A., Prasanna, V., & Walters, J. P. (2020). Compiler abstractions and runtime for extreme-scale sar and cfd workloads. 2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), 1–7.
Kang, D. I. D., Walters, J. P., & Crago, S. P. (2020). Scalable parallel file write from a large numa system. HPEC.
Datta, K., Rittenbach, A., Kang, D.-I., Walters, J. P., Crago, S. P., & Damoulakis, J. (2019). Computational requirements for real-time ptychographic image reconstruction. Applied Optics, 58(7), B19–B27.
Tran, G. P., Walters, J. P., & Crago, S. (2019). Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy. 2019 IEEE International Conference on Services Computing (SCC), 51–55.
Tran, G. P., Walters, J. P., & Crago, S. (2018). Reducing tail latencies while improving resiliency to timing errors for stream processing workloads. 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC), 194–203.
Chen, Y.-A., Tran, G. P. C., Rittenbach, A. J., Walters, J. P., & Crago, S. (2018). Pacer: Automated Feedback-Based Vertical Elasticity for Heterogeneous Soft Real-Time Workloads. 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC), 73–82.
Kang, M., Kang, D.-I., Walters, J. P., & Crago, S. P. (2017). A comparison of system performance on a private openstack cloud and amazon ec2. 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), 310–317.
Tran, G. P. C., Walters, J. P., & Crago, S. P. (2017). Dynamically Improving Resiliency to Timing Errors for Stream Processing Workloads. 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 469–476.
Chen, Y.-A., Walters, J. P., & Crago, S. P. (2017). Load balancing for minimizing deadline misses and total runtime for connected car systems in fog computing. 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 683–690.
Tran, G. P. C., Chen, Y.-A., Kang, D.-I., Walters, J. P., & Crago, S. P. (2016). Automated demand-based vertical elasticity for heterogeneous real-time workloads. 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), 831–834.
Tran, G. P. C., Chen, Y.-A., Kang, D.-I., Walters, J. P., & Crago, S. P. (2016). Hypervisor performance analysis for real-time workloads. 2016 IEEE High Performance Extreme Computing Conference (HPEC), 1–7.
Younge, A. J., Walters, J. P., Crago, S. P., & Fox, G. C. (2015). Supporting high performance molecular dynamics in virtualized clusters using IOMMU, SR-IOV, and GPUDirect. ACM SIGPLAN Notices, 50(7), 31–38.
Crago, S. P., & Walters, J. P. (2015). Heterogeneous cloud computing: The way forward. Computer, 48(01), 59–61.
Musleh, M., Pai, V., Walters, J. P., Younge, A., & Crago, S. (2014). Bridging the virtualization performance gap for HPC using SR-IOV for InfiniBand. 2014 IEEE 7th International Conference on Cloud Computing, 627–635.
Walters, J. P., Younge, A. J., Kang, D. I., Yao, K. T., Kang, M., Crago, S. P., & Fox, G. C. (2014). GPU passthrough performance: A comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. 2014 IEEE 7th International Conference on Cloud Computing, 636–643.
Younge, A. J., Walters, J. P., Crago, S., & Fox, G. C. (2014). Evaluating GPU passthrough in Xen for high performance cloud computing. 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 852–859.
Walters, J. P., Zick, K. M., & French, M. (2013). A practical characterization of a NASA SpaceCube application through fault emulation and laser testing. 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 1–8.
Younge, A. J., Walters, J. P., Crago, S., & Fox, G. C. (2013). Enabling high performance computing in cloud infrastructure using virtualized GPUs.
Schmidt, A. G., Walters, J. P., Zick, K. M., French, M., Keymeulen, D., Aranki, N., Klimesh, M., & Kiely, A. (2012). Applying radiation hardening by software to fast lossless compression prediction on FPGAs. 2012 IEEE Aerospace Conference, 1–10.
Zick, K. M., Yu, C.-C., Walters, J. P., & French, M. (2012). Silent data corruption and embedded processing with NASA’s SpaceCube. IEEE Embedded Systems Letters, 4(2), 33–36.
Pan, A., Walters, J. P., Pai, V. S., Kang, D.-I. D., & Crago, S. P. (2012). Integrating high performance file systems in a cloud computing environment. 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 753–759.
Singh, K., Walters, J. P., Hestness, J., Suh, J., Rogers, C. M., & Crago, S. P. (2011). Fftw and complex ambiguity function performance on the maestro processor. 2011 Aerospace Conference, 1–8.
Bucciero, M., Walters, J. P., Moussalli, R., Gao, S., & French, M. (2011). The PowerPC 405 memory sentinel and injection system. 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 154–161.
French, M., Walters, J. P., & Zick, K. (2011). Autonomous on-board processing for sensor systems: Initial fault tolerance and autonomy results. Earth Science Technology Forum.
Bucciero, M., Walters, J. P., & French, M. (2011). Software fault tolerance methodology and testing for the embedded PowerPC. 2011 Aerospace Conference, 1–9.
Crago, S., Dunn, K., Eads, P., Hochstein, L., Kang, D.-I., Kang, M., Modium, D., Singh, K., Suh, J., & Walters, J. P. (2011). Heterogeneous cloud computing. 2011 IEEE International Conference on Cluster Computing, 378–385.
Walters, J. P., Kost, R., Singh, K., Suh, J., & Crago, S. P. (2011). Software-based fault tolerance for the Maestro many-core processor. 2011 Aerospace Conference, 1–12.
Crago, S. P., Kang, D.-I., Kang, M., Kost, R., Singh, K., Suh, J., & Walters, J. P. (2011). Programming models and development software for a space-based many-core processor. 2011 IEEE Fourth International Conference on Space Mission Challenges for Information Technology, 95–102.
French, M., Walters, J., & Zick, K. (2011). Initial Fault Tolerance and Autonomy Results for Autonomous On-board Processing of Hyperspectral Imaging. AGU Fall Meeting Abstracts, 2011, IN42A–03.
Bakhtiari, M., Malhotra, H., Jones, M. D., Chaudhary, V., Walters, J. P., & Nazareth, D. (2010). Applying graphics processor units to Monte Carlo dose calculation in radiation therapy. Journal of Medical Physics, 35(2), 120–122.
Walters, J. P., Chaudhary, V., & Schmidt, B. (2010). Database Searching with Profile-Hidden Markov Models on Reconfigurable and Many-Core Architectures. In Bioinformatics: high performance parallel computer architectures (Vol. 2, p. 203). CRC Pr I Llc.
Chaudhary, V., Walters, J. P. N., & Jiang, H. (2010). Computation Checkpointing and Migration. Nova Science Publishers, Inc.
Walters, J. P., Balu, V., Kompalli, S., & Chaudhary, V. (2009). Evaluating the use of GPUs in liver image segmentation and HMMER database searches. 2009 Ieee International Symposium on Parallel & Distributed Processing, 1–12.
Walters, J. P., & Chaudhary, V. (2009). A fault-tolerant strategy for virtualized HPC clusters. The Journal of Supercomputing, 50, 209–239.
Walters, J. P., Darole, R., & Chaudhary, V. (2009). Improving MPI-HMMER’s scalability with parallel I/O. Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium On, 1–11.
Walters, J. P., Landman, J., & Chaudhary, V. (2009). MPI-HMMER. URL: Http://Www. Mpihmmer. Org.
Walters, J. P., Chaudhary, V., Cha, M., Guercio, S., & Gallo, S. (2008). A comparison of virtualization technologies for HPC. 22nd International Conference on Advanced Information Networking and Applications (Aina 2008), 861–868.
Walters, J. P., & Chaudhary, V. (2008). Replication-based fault tolerance for MPI applications. IEEE Transactions on Parallel and Distributed Systems, 20(7), 997–1010.
Walters, J. P., Bantwal, B., & Chaudhary, V. (2008). Enabling interactive jobs in virtualized data centers. Cloud Computing and Its Applications.
Walters, J. P., Landman, J., & Chaudhary, V. (2008). Optimized Cluster-Enabled HMMER Searches. In Grid computing for bioinformatics and computational biology (pp. 51–70). John Wiley & Sons, Inc.
Walters, J. P., Balu, V., Chaudhary, V., Kofke, D., & Schultz, A. (2008). Accelerating molecular dynamics simulations with gpus. ISCA PDCCS, 44–49.
Walters, J. P., Liang, Z., Shi, W., & Chaudhary, V. (2007). Wireless sensor network security: A survey. In Security in distributed, grid, mobile, and pervasive computing (p. 367). CRC Press: Boca Raton, FL, USA.
Walters, J. P., Meng, X., Chaudhary, V., Oliver, T., Yeow, L. Y., Schmidt, B., Nathan, D., & Landman, J. (2007). MPI-HMMER-Boost: Distributed fpga acceleration. The Journal of VLSI Signal Processing, 48(3), 223–238.
Walters, J., & Chaudhary, V. (2007). A comprehensive user-level checkpointing strategy for MPI applications. Technical Report, TR 2007-1.
Walters, J., & Chaudhary, V. (2007). A scalable asynchronous replication-based strategy for fault tolerant MPI applications. High Performance Computing, HiPC 2007, 257–268.
Rood, B., Walters, J. P., Chaudhary, V., & Lewis, M. J. (2007). Failure Prediction and Scalable Checkpointing for Reliable Large-Scale Grid Computing. IEEE HPDC’07.
Walters, J. P. N. (2007). Fault-tolerant techniques for high performance computing and a bioinformatics application [PhD thesis]. ProQuest.
Walters, J. P., Qudah, B., & Chaudhary, V. (2006). Accelerating the HMMER sequence analysis suite using conventional processors. 20th International Conference on Advanced Information Networking and Applications-Volume 1 (AINA’06), 1, 6–pp.
Walters, J., & Chaudhary, V. (2006). Application-level checkpointing techniques for parallel programs. Distributed Computing and Internet Technology, 221–234.
Landman, J., Ray, J., & Walters, J. P. (2006). Accelerating HMMer searches on Opteron processors with minimally invasive recoding. 20th International Conference on Advanced Information Networking and Applications-Volume 1 (AINA’06), 2, 5–pp.
Walters, J. P., Jiang, H., & Chaudhary, V. (2006). An adaptive heterogeneous software DSM. 2006 International Conference on Parallel Processing Workshops (ICPPW’06), 8–pp.
Jiang, H., Chaudhary, V., & Walters, J. P. (2006). Data Conversion for Heterogeneous Migration/Checkpointing. In High-performance computing: paradigm and infrastructure (Vol. 44, p. 241). John Wiley and Sons.
Jiang, H., Chaudhary, V., & Walters, J. P. (2003). Data conversion for process/thread migration and checkpointing. 2003 International Conference on Parallel Processing, 2003. Proceedings., 473–480.
Younge, A. J., Walters, J. P., Suh, J., Kang, D.-I. D., Park, Y., Crago, S. P., & Fox, G. C. Towards a high performance virtualized iaas deployment.
Pai, V. S., Crago, S. P., Kang, D.-I., Kang, M., Singh, K., Suh, J., Walters, J. P., & Younge, A. J. Virtualized Cloud Computing for Exascale Performance.
Powell, W., Butler, J., Duncan, B., Oyama, J., Petrick, D., Green, C., Henriquez, D., Azarbarzin, A., Bibyk, I., Clancy, P., & others. Organizing Committee 2021 Space Computing Conference.
Crago, S., Dickenson, J., Duncan, B., Earnest, H., Enright, D., Fraeman, M., Green, C., Hammer, I., Ling, K., MacKinnon, J., & others. Reviewers 2021 Space Computing Conference.
Imes, C., Rittenbach, A., Xie, P., Kang, D. I. D., Walters, J. P., & Crago, S. P. Deep Learning Recommendation Model Training Co-design with the Dynamic Opera Network. Small, 2, 4.
Wang, H., Liu, Z., Fang, C., Walters, J. P., & Crago, S. P. MoQ: Mixture-of-format Activation Quantization for Communication-efficient AI Inference System. NeurIPS 2024 Workshop Machine Learning with New Compute Paradigms.

JP Walters, PhD