We consider the problem of multi-channel single-speaker blind dereverberation, where multi-channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS, Unsupervised Speech Dereverberation via Diffusion Dosterior Sampling. USD-DPS uses an unconditional clean speech diffusion model as a strong prior to solve the problem by posterior sampling. At each diffusion sampling step, we estimate all microphone channels' room impulse responses (RIRs), which are further used to enforce a multi-channel mixture consistency constraint for diffusion guidance. For multi-channel RIR estimation, we estimate reference-channel RIR by optimizing RIR parameters of a sub-band RIR signal model, with the Adam optimizer. We estimate non-reference channels' RIRs analytically using forward convolutive prediction (FCP). We found that this combination provides a good balance between sampling efficiency and RIR prior modeling, which shows superior performance among unsupervised dereverberation approaches.
Here is the main result of our paper for 8-channel WSJ0CAM-DEREVERB dataset.
Here is the hyperparameter tunning tables for WSJ0CAM-DEREVERB dataset. We experiment among a set of values for zeta, lambda, and K
Here is the run-time comparison of various models on WSJ0CAM-DEREVERB dataset
In the following demos, we evalute a subset of methods from Table 1 (shown above):
We take the first 10 samples from WSJ0CAM-DEREVERB dataset, and we demo the inference result by taking 1/2/4/8 channels of the sample. For 1-channel dereverberation, we usethe mixture signal captured by microphone 1; for 2-channels dereverberation, we use microphone l and 4, for 4-channels dereverberation, we use microphone 1,3, 5 and 7; and for 8-channels dereverberation, all the 8 microphones are used.
Utterance ID=000000
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000001
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000002
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000003
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000004
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000005
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000006
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000007
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000008
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |
Utterance ID=000009
Mixture | Clean |
---|---|
![]() |
![]() |
Methods | 1-Channel | 2-Channel | 4-Channel | 8-Channel |
---|---|---|---|---|
WPE |
![]() |
![]() |
![]() |
![]() |
DNN-WPE |
![]() |
![]() |
![]() |
|
BUDDy (NCSN++) |
![]() |
|||
BUDDy (1D-UNet) |
![]() |
|||
MC-BUDDy |
![]() |
![]() |
![]() |
|
USD-DPS |
![]() |
![]() |
![]() |
|
NBC |
![]() |
![]() |
![]() |