- S. U. Hussain, M. Samragh, X. Zhang, K. Huang and F. Koushanfar, On the Application of Binary Neural Networks in Oblivious Inference, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, May, 2021.
- S. U. Hussain, S. M. Riazi, and F. Koushanfar, The Fusion of Secure Function Evaluation and Logic Synthesis, IEEE Security & Privacy, March, 2021.
- S. U. Hussain, B. Li, F. Koushanfar, and R. Cammarota, TinyGarble2: Smart, Efficient, and Scalable Yao’s Garble Circuit, Workshop on Privacy-Preserving Machine Learning in Practice (PPMLP), November, 2020.
- R. Cammarota et. al., Trustworthy AI Inference Systems: An Industry Research View, Workshop on Privacy-Preserving Machine Learning in Practice (PPMLP), November, 2020.
- H. Chen, S. U. Hussain, F. Boemer, E. Stapf, A. - R. Sadeghi, F. Koushanfar, and R. Cammarota, Developing Privacy-Preserving AI Systems: The Lessons Learned, Design Automation Conference (DAC), July, 2020.
- S. U. Hussain and F. Koushanfar, FASE: FPGA Acceleration of Secure Function Evaluation, Field-Programmable Custom Computing Machines (FCCM), April, 2019.
- S. M. Riazi, M. Javaheripi, S. U. Hussain and F. Koushanfar, MPCircuits: Optimized Circuit Generation for Secure Multi-Party Computation, Hardware Oriented Security and Trust (HOST), June, 2019.
- E. Songhori, S. M. Riazi, S. U. Hussain, and F. Koushanfar, ARM2GC: Succinct Garbled Processor for Secure Computation, Design Automation Conference (DAC), June, 2018.
- S. U. Hussain, S. M. Riazi, and F. Koushanfar, SHAIP: Secure Hamming Distance for Authentication of Intrinsic PUFs, ACM Transactions on Design Automation of Electronic Systems (TODAES), 23.6, p.75, 2018.
- S. U. Hussain and F. Koushanfar, P3: Privacy Preserving Positioning for Smart Automotive Systems, ACM Transactions on Design Automation of Electronic Systems (TODAES), 23.6, p.79, 2018.
- S. U. Hussain, B. D. Rouhani, M. Ghasemzadeh, and F. Koushanfar, MAXelerator: FPGA Accelerator for Privacy Preserving Multiply-Accumulate (MAC) on Cloud Servers, Design Automation Conference (DAC), July, 2018.
- S. U. Hussain, and F. Koushanfar, Privacy Preserving Localization for Smart Automotive Systems, Design Automation Conference (DAC), June, 2016.
- S. U. Hussain, M. Majzoobi, and F. Koushanfar, A Built-In-Self-Test Scheme for Online Evaluation of Physical Unclonable Functions and True Random Number Generators, IEEE Transactions on Multi-Scale Computing Systems, vol. 2, issue 99, January, 2016.
- E. Songhori , S. U. Hussain, A. - R. Sadeghi, and F. Koushanfar, Compacting privacy-preserving k-nearest neighbor search using logic synthesis, Design Automation Conference (DAC), June, 2015.
- E. Songhori , S. U. Hussain, A. - R. Sadeghi, T. Schneider, and F. Koushanfar, TinyGarble: Highly Compressed and Scalable Sequential Garbled Circuits, IEEE Symposium on Security and Privacy (S&P), May, 2015.
- S. U. Hussain, S. Yellapantula, M. Majzoobi, and F. Koushanfar, BIST-PUF: Online, Hardware-based Evaluation of Physically Unclonable Circuit Identifiers, International Conference on Computer-Aided Design, November, 2014.
- Paul, M.K., Sagar, M.A.K., Hussain, S.U. & Rashid, A.B.M.H., UWB microwave imaging via modified beamforming for early detection of breast cancer, In IEEE International Conference on Electrical and Computer Engineering (ICECE), December, 2010.
This paper explores the application of Binary Neural Networks (BNN) in oblivious inference, a service provided by a server to mistrusting clients. Using this service, a client can obtain the inference result on her data by a trained model held by the server without disclosing the data or leaning the model parameters. We make two contributions to this field. First, we devise light-weight cryptographic protocols designed specifically to exploit the unique characteristics of BNNs. Second, we present dynamic exploration of the runtime-accuracy tradeoff of BNNs in a single-shot training process. While previous works trained multiple BNNs with different computational complexities (which is cumbersome due to the slow convergence of BNNs), we train a single BNN that can perform inference under different computational budgets. Compared to CryptFlow2, the state-of-the-art in oblivious inference of nonbinary DNNs, our approach reaches 2× faster inference at the same accuracy. Compared to XONN, the state-of-the-art in oblivious inference of binary networks, we achieve 2× to 11× faster inference while obtaining higher accuracy.
 
Designing custom secure function evaluation compilers has been an active research area. However, intelligent adaptation of the integrated circuit synthesis tools outperforms these compilers. It is time for the custom compilers to embrace this trend.
 
We present TinyGarble2 – a C++ framework for privacy-preserving computation through the Yao’s Garbled Circuit (GC) protocol in both the honest-but-curious and the malicious security models. TinyGarble2 provides a rich library with arithmetic and logic building blocks for developing GC-based secure applications. The framework offers abstractions among three layers: the C++ program, the GC back-end and the Boolean logic representation of the function being computed. TinyGarble2 thus allowing the most optimized versions of all pertinent components. These abstractions, coupled with secure share transfer among the functions make TinyGarble2 the fastest and most memory-efficient GC framework. In addition, the framework provides a library for Convolutional Neural Networks (CNN). Our evaluations show that TinyGarble2 is the fastest among the current end-to-end GC frameworks while also being scalable in terms of memory footprint. Moreover, it performs 18× faster on the CNN LeNet-5 compared to the existing scalable frameworks.
 
In this work, we provide an industry research view for approaching the design, deployment, and operation of trustworthy Artificial Intelligence (AI) inference systems. Such systems provide customers with timely, informed, and customized inferences to aid their decision, while at the same time utilizing appropriate security protection mechanisms for AI models. Additionally, such systems should also use Privacy-Enhancing Technologies (PETs) to protect customers' data at any time. To approach the subject, we start by introducing trends in AI inference systems. We continue by elaborating on the relationship between Intellectual Property (IP) and private data protection in such systems. Regarding the protection mechanisms, we survey the security and privacy building blocks instrumental in designing, building, deploying, and operating private AI inference systems. For example, we highlight opportunities and challenges in AI systems using trusted execution environments combined with more recent advances in cryptographic techniques to protect data in use. Finally, we outline areas of further development that require the global collective attention of industry, academia, and government researchers to sustain the operation of trustworthy AI inference systems.
 
Advances in customers' data privacy laws create pressures and pain points across the entire lifecycle of AI products. Working figures such as data scientists and data engineers need to account for the correct use of privacy-enhancing technologies such as homomorphic encryption, secure multi-party computation, and trusted execution environment when they develop, test and deploy products embedding AI models while providing data protection guarantees. In this work, we share the lessons learned during the development of frameworks to aid data scientists and data engineers to map their optimized workloads onto privacy-enhancing technologies seamlessly and correctly.
 
We present FASE, an FPGA accelerator for Secure Function Evaluation (SFE) by employing the well-known cryptographic protocol named Yao’s Garbled Circuit (GC). SFE allows two parties to jointly compute a function on their private data and learn the output without revealing their inputs to each other. FASE is designed to allow cloud servers to provide secure services to a large number of clients in parallel while preserving the privacy of the data from both sides. Current SFE accelerators either target specific applications, and therefore are not amenable to generic use, or have low throughput due to inefficient management of resources. In this work, we present a pipelined architecture along with an efficient scheduling scheme to ensure optimal usage of the available resources. The scheme is built around a simulator of the hardware design that schedules the workload and assigns the most suitable task to the encryption cores at each cycle. This, coupled with optimal management of the read and write cycles of the Block RAM on FPGA, results in a minimum 2 orders of magnitude improvement in terms of throughput per core for the reported benchmarks compared to the most recent generic GC accelerator. Moreover, our encryption core requires 17% less resource compared to the most recent secure GC realization.
 
Secure Multi-party Computation (MPC) is one of the most influential achievements of modern cryptography: it allows evaluation of an arbitrary function on private inputs from multiple parties without revealing the inputs. A crucial step of utilizing contemporary MPC protocols is to describe the function as a Boolean circuit. While efficient solutions have been proposed for special case of two-party secure computation, the general case of more than two-party is not addressed. This paper proposes MPCircuits, the first automated solution to devise the optimized Boolean circuit representation for any MPC function using hardware synthesis tools with new customized libraries that are scalable to multiple parties. MPCircuits creates a new end-to-end tool-chain to facilitate practical scalable MPC realization. To illustrate the practicality of MPCircuits, we design and implement a set of five circuits that represent real-world MPC problems. Our benchmarks inherently have different computational and communication complexities and are good candidates to evaluate MPC protocols. We also formalize the metrics by which a given protocol can be analyzed. We provide extensive experimental evaluations for these benchmarks; two of which are the first reported solutions in multi-party settings. As our experimental results indicate, MPCircuits reduces the computation time of MPC protocols by up to 4.2×.
 
We present ARM2GC a novel secure computation framework based on Yao’s Garbled Circuit (GC) protocol and the ARM processor. It allows users to develop privacy-preserving applications using standard high-level programming languages (e.g., C) and compile them using off-the-shelf ARM compilers (e.g., gcc-arm). The main enabler of this framework is the introduction of SkipGate, an algorithm that dynamically omits the communication and encryption cost of the gates whose outputs are independent of the private data. SkipGate greatly enhances the performance of ARM2GC by omitting costs of the gates associated with the instructions of the compiled binary, which is known by both parties involved in the computation. Our evaluation on benchmark functions demonstrates that ARM2GC not only outperforms the current GC frameworks that support high-level languages, it also achieves efficiency comparable to the best prior solutions based on hardware description languages. Moreover, in contrast to previous high-level frameworks with domain-specific languages and customized compilers, ARM2GC relies on standard ARM compiler which is rigorously verified and supports programs written in the standard syntax.
 
In this paper, we present SHAIP, a secure Hamming distance based mutual authentication protocol. It allows an unlimited number of authentications by employing an intrinsic Physical Unclonable Function (PUF). PUFs are being increasingly employed for remote authentication of devices. Most of these devices have limited resources. Therefore, the intrinsic PUFs are most suitable for this task as they can be built with little or no modification to the underlying hardware platform. One major drawback of the current authentication schemes is that they expose the PUF response. This makes the intrinsic PUFs, which have a limited number of challenge-response pairs, unusable after a certain number of authentication sessions. Moreover, these schemes are one way in the sense that they only allow one party, the prover, to authenticate herself to the verifier. We propose a symmetric mutual authentication scheme based on secure (privacy-preserving) computation of the Hamming distance between the PUF response from the remote device and reference response stored at the verifier end. This allows both parties to authenticate each other without revealing their respective sets of inputs. We show that our scheme is effective with all state-of-the-art intrinsic PUFs. The proposed scheme is lightweight and does not require any modification to the underlying hardware.
 
This paper presents the first privacy-preserving localization method based on provably secure primitives for smart automotive systems. Using this method, a car that is lost due to unavailability of GPS, can compute its location with assistance from three nearby cars while the locations of all the participating cars including the lost car remain private. Technological enhancement of modern vehicles, especially in navigation and communication, necessitates parallel enhancement in security and privacy. Previous approaches to maintaining user location privacy suffered from one or more of the following drawbacks: trade-off between accuracy and privacy, one-sided privacy and the need of a trusted third party that presents a single point to attack. The localization method presented here is one of the very first location-based services that eliminates all these drawbacks. Two protocols for computing the location is presented here based on two Secure Function Evaluation (SFE) techniques that allow multiple parties to jointly evaluate a function on inputs which are encrypted to maintain privacy. The first one is based on the two-party protocol named Yao’s Garbled Circuit (GC). The second one is based on the Beaver-Micali-Rogaway (BMR) protocol that allows inputs from more than two parties. The two secure localization protocols exhibit trade-offs between performance and resilience against collusion. Along with devising the protocols, we design and optimize netlists for the functions required for location computation by leveraging conventional logic synthesis tools with custom libraries optimized for SFE. Proof-of-concept implementation of the protocol shows that the complete operation can be performed within only 355 ms. The fast computing time enables localization of even moving cars.
 
This paper presents MAXelerator, the first hardware accelerator for privacy-preserving machine learning (ML) on cloud servers. Cloud-based ML is being increasingly employed in various data sensitive scenarios. While it enhances both efficiency and quality of the service, it also raises concern about privacy of the users’ data. We create a practical privacy-preserving solution for matrix-based ML on cloud servers. We show that for the majority of the ML applications, the privacy-sensitive computation boils down to either matrix multiplication, which is a repetition of Multiply-Accumulate (MAC) or the MAC itself. We design an FPGA architecture for privacy-preserving MAC to accelerate the ML computation based on the well known Secure Function Evaluation protocol named Yao’s Garbled Circuit. MAXelerator demonstrates up to 57× improvement in throughput per core compared to the fastest existing GC framework. We corroborate the effectiveness of the accelerator with real-world case studies in privacy-sensitive scenarios.
 
This paper presents the first provably secure localization method for smart automotive systems. Using this method, a lost car can compute its location with assistance from three nearby cars while the locations of all the participating cars including the lost car remain private. This localization application is one of the very first location-based services that does not sacrifice accuracy to maintain privacy. The secure location is computed using a protocol utilizing Yao’s Garbled Circuit (GC) that allows two parties to jointly compute a function on their private inputs. We design and optimize GC netlists of the functions required for computation of location by leveraging conventional logic synthesis tools. Proof-of-concept implementation of the protocol shows that the complete operation can be performed within only 550 ms. The fast computing time enables practical localization of moving cars.
 
In the emerging era of Internet of Things (IoT) where various physical entities are spontaneously communicating with each other and sharing sensitive information, it is prohibitive to have a global entity for maintaining the security of the complex web against environmental variations and active attacks. Therefore, it is crucial that each entity has the capability of safeguarding its security features on its own. Methods based on harnessing the random identification and authentication from the physical device and environment, such as physical unclonable functions (PUFs) and True Random Number Generators (TRNGs), if securely run, are promising primitives for protecting lightweight IoT devices. This paper presents the first Built-In-Self-Test scheme for on-the-fly evaluation of PUFs that can also be utilized for assessing the desired statistical properties of TRNGs. Unlike earlier known PUF evaluation suites that were software-based and offline, our methodology enables online assessment of the pertinent statistical and security properties all in hardware. Specifically, the BIST structure is designed to evaluate two main properties of PUFs: unpredictability and stability. Our work is the first online test suite that thoroughly evaluates the internal health of the entropy source of TRNGs along with the statistical properties of the generated bit stream. Comprehensive real-time evaluation by the BIST method is able to ensure robustness and security of both TRNG and PUF in the face of operational, structural, and environmental fluctuations due to variations, aging, or adversarial acts. Proof-of-concept implementation of our BIST methodology in FPGA demonstrates its reasonable overhead, effectiveness, and practicality.
 
This paper introduces the first efficient, scalable, and practical method for privacy-preserving k-nearest neighbors (k-NN) search. The approach enables performing the widely used k-NN search in sensitive scenarios where none of the parties reveal their information while they can still cooperatively find the nearest matches. The privacy preservation is based on the Yao’s garbled circuit (GC) protocol. In contrast with the existing GC approaches that only accept function descriptions as combinational circuits, we suggest using sequential circuits. This work introduces novel transformations, such that the sequential description can be evaluated by interfacing with the existing GC schemes that only accept combinational circuits. We demonstrate a great effi- ciency in the memory required for realizing the secure k-NN search. The first-of-a-kind implementation of privacy preserving k-NN, utilizing the Synopsys Design Compiler on a conventional Intel processor demonstrates the applicability, efficiency, and scalability of the suggested methods.
 
We introduce TinyGarble, a novel automated methodology based on powerful logic synthesis techniques for generating and optimizing compressed Boolean circuits used in secure computation, such as Yao’s Garbled Circuit (GC) protocol. TinyGarble achieves an unprecedented level of compactness and scalability by using a sequential circuit description for GC. We introduce new libraries and transformations, such that our sequential circuits can be optimized and securely evaluated by interfacing with available garbling frameworks. The circuit compactness makes the memory footprint of the garbling operation fit in the processor cache, resulting in fewer cache misses and thereby less CPU cycles. Our proof-of-concept implementation of benchmark functions using TinyGarble demonstrates a high degree of compactness and scalability. We improve the results of existing automated tools for GC generation by orders of magnitude; for example, TinyGarble can compress the memory footprint required for 1024-bit multiplication by a factor of 4,172, while decreasing the number of non-XOR gates by 67%. Moreover, with TinyGarble we are able to implement functions that have never been reported before, such as SHA-3. Finally, our sequential description enables us to design and realize a garbled processor, using the MIPS I instruction set, for private function evaluation. To the best of our knowledge, this is the first scalable emulation of a general purpose processor.
 
Physical Unclonable Functions (PUF) are of increasing importance due to their many hardware security applications including chip fingerprinting, metering, authentication, anticounterfeiting, and supply-chain tracing, e.g., DARPA SHIELD. This paper presents BIST-PUF, the first built-in-self-test (BIST) methodology for online evaluation of weak and strong PUFs. BIST-PUF provides a paradigm shift in the evaluation of the unclonable circuit identifiers: unlike earlier known PUF evaluation suites that are software-based and offline, BIST-PUF enables onthe-fly assessment of the desired PUF properties all in hardware. More specifically, the BIST-PUF structure is designed to evaluate two main properties of PUFs, namely unpredictability and stability. These properties are important for ensuring robustness and security in face of operational, structural, and environmental fluctuations due to variations, aging or adversarial acts. For BISTPUF unpredictability evaluation, we identify and adopt the tests of randomness that are amenable to hardware implementation. For stability assessment, the BIST-PUF suggests three distinct methods, namely, sensor-based, parametric interrogation, and multiple interrogations. Proof-of-concept implementation of the BIST-PUF in FPGA demonstrates its low overhead, effectiveness, and practicality
 
Ultra-wideband (UWB) microwave imaging is a promising technique for detecting early stage breast cancer, which exploits the significant contrast in dielectric properties between normal and malignant breast tissues. In this paper, we have proposed a new modified compensation method and beamforming technique for microwave imaging. We used a three dimensional (3-D) Finite Integration Technique (FIT) based breast model, with normal breast tissue, supported on a layer of chest muscle and covered by a thin layer of skin. A small sized (1 mm diameter) tumor is placed within the breast tissue layer. A pair of rounded-edge bow-tie antennas at crossed position is used for transmitting and receiving microwave signals. This antenna pair is then placed at different positions over the breast surface and the incident and backscattered signal at each position are stored. Backscatters are then processed to eliminate artifacts. Finally they are passed through the beamformer and an image is formed. The beamformer is designed with adaptive weighting to compensate both propagation attenuation and lossy medium effect. Despite using the traditional delay-and-sum approach, new delay-and-product technique is used in beamforming. This modified beamforming approach is shown to outperform its previous counterparts in terms of resolution and sensitivity.