GuardSet-X: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset.
CoRR, June, 2025
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning.
CoRR, March, 2025
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance.
CoRR, March, 2025
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
et al.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Eia: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
R2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
MgSvF: Multi-Grained Slow versus Fast Framework for Few-Shot Class-Incremental Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2024
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models.
CoRR, 2024
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents.
CoRR, 2024
R<sup>2</sup>-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning.
CoRR, 2024
Certifiably Byzantine-Robust Federated Conformal Prediction.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
FaShapley: Fast and Approximated Shapley Based Model Pruning Towards Certifiably Robust DNNs.
Proceedings of the 2023 IEEE Conference on Secure and Trustworthy Machine Learning, 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Label-Assemble: Leveraging Multiple Datasets with Partial Labels.
Proceedings of the 20th IEEE International Symposium on Biomedical Imaging, 2023
Certifying Some Distributional Fairness with Subpopulation Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Fairness in Federated Learning via Core-Stability.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels.
CoRR, 2021