Supplemental webpage for
  Unsupervised Melody Style Conversion
  by Eita Nakamura, Kentaro Shibata, Ryo Nishikimi, Kazuyoshi Yoshii
  (Paper submitted to ICASSP 2019)

1. Data

We use three sets of data belonging to different music categories.

2. Unsupervised learning of music styles

We train a transposition-symmetric torus Markov mixture model (TSTMMixM) (see paper) from the dataset of each music category. Each component model is expected to represent music styles in each music category. Some examples of learned styles are visualized below.

2.1. Styles in pitch organization

Pitch-class transition probabilities obtained by integrating out the rhythmic variables. Transition (bigram) probabilities are represented by bands.

Classical music style
(Major diatonic scale)
Classical music style
(Minor diatonic scale)
J-pop style 1
(Major diatonic scale)
J-pop style 2
(Minor diatonic scale)
Enka music style 1
(Major pentatonic scale)
Enka music style 2
(Minor pentatonic scale)
It is worth emphasizing that these musical scales were inferred unsupervisedly without any annotation on the tonic and mode!

2.2. Styles in rhythmic organization

Metrical transition probabilities obtained by integrating out the pitch-class variables. Transition (bigram) probabilities are represented by bands. Like in a clock, a metrical (beat) position in a bar (4/4 time) is represented on a circle. 0 o'clock, 3 o'clock, 6 o'clock, and 9 o'clock indicate quarter-note beats.

Classical music style
J-pop style 1
(8th-note rhythm)
J-pop style 2
(16th-note rhythm)
Enka music style 1
(Mixed 8th-note rhythm)
Enka music style 2
(Dotted 8th-note rhythm)

3. Sound examples

Each original melody (8-bar long) is converted to the target style. For comparison, we tested the following three methods (see paper for details):

3.1. Classical to J-pop / Enka

Original melody (Classical music style)

Target: J-pop (TMM+SEM)

Target: J-pop (TSTMMixM+SEM)

Target: J-pop (TSTMMixM+REM)

Target: Enka (TMM+SEM)

Target: Enka (TSTMMixM+SEM)

Target: Enka (TSTMMixM+REM)

3.2. Classical to J-pop / Enka

Original melody (Classical music style)

Target: J-pop (TMM+SEM)

Target: J-pop (TSTMMixM+SEM)

Target: J-pop (TSTMMixM+REM)

Target: Enka (TMM+SEM)

Target: Enka (TSTMMixM+SEM)

Target: Enka (TSTMMixM+REM)

3.3. J-pop to Classical / Enka

Original melody (J-pop style)

Target: Classical (TMM+SEM)

Target: Classical (TSTMMixM+SEM)

Target: Classical (TSTMMixM+REM)

Target: Enka (TMM+SEM)

Target: Enka (TSTMMixM+SEM)

Target: Enka (TSTMMixM+REM)

3.4. J-pop to Classical / Enka

Original melody (J-pop style)

Target: Classical (TMM+SEM)

Target: Classical (TSTMMixM+SEM)

Target: Classical (TSTMMixM+REM)

Target: Enka (TMM+SEM)

Target: Enka (TSTMMixM+SEM)

Target: Enka (TSTMMixM+REM)

3.5. Enka to Classical / J-pop

Original melody (Enka music style)

Target: Classical (TMM+SEM)

Target: Classical (TSTMMixM+SEM)

Target: Classical (TSTMMixM+REM)

Target: J-pop (TMM+SEM)

Target: J-pop (TSTMMixM+SEM)

Target: J-pop (TSTMMixM+REM)

3.6. Enka to Classical / J-pop

Original melody (Enka music style)

Target: Classical (TMM+SEM)

Target: Classical (TSTMMixM+SEM)

Target: Classical (TSTMMixM+REM)

Target: J-pop (TMM+SEM)

Target: J-pop (TSTMMixM+SEM)

Target: J-pop (TSTMMixM+REM)

4. Subjective evaluation

4.2. Setup

We conducted a subjective evaluation test to measure the quality of melody style conversion. Here is the page used for the listening evaluation.

10 evaluators who listen to music more than one hour a day participated the experiment; they listened to the above arranged melodies and evaluated each arrangment in terms of the following scores:

4.2. Result

The results clearly demonstrate the effectiveness of the TSTMMixM and the refined edit model. The results show that the mean scores of all the metrics are improved by refinements of the method. Particularly, the style match score improved by 0.5 (p-value < 10−5, t-test) with the refined language model (M1 vs M2), and the similarity score improved by 0.28 (p-value = 3.3 × 10−3, t-test) with the refined edit model (M2 vs M3). The improvements in the naturalness and attractiveness scores are also statistically significant. These results clearly demonstrate the efficacy of the proposed method.