Audio Collection

Collection Scenes

Metadata

Topics

Device

Software

Output format

Distance

 Voice-controlled command

Speaker ID,

Name,

Age,

Gender,

Demographic Info,

Mother tongue,

Script,

Time stamp,

Device used,

Distance

 Wake-up word, Control commands  Mobile phone  Collection app

Bit depth,

Frequency,

Channel,

other format,
 
 
 
 

 0.3-3m
 Free conversation  By industry field

 Laptop,

Mobile phone, Microphone

 Target application, Collection app  
 0.5-1m
 Single-person recording Designated script  Collection app  
Environmental noise  Date, Scenario  N/A  Mobile phone  2-5m
 TTS  Professional voice actor with designated tone  Designated script  High-fidelity microphone  Studio  0.3m

Requirement List

 

Factors

Definition

Applied Scenarios

Specific scenarios where this project will be used to make all stakeholders have a better understanding

Workload

Workload for each participant, proportion duration of short or long audio, TAT for the whole project

Script

Provided by client, generated by OSDT, generated by individuals

Language

Mother languages, tongues, accents

Domains

History, Agriculture, Culture, Finance, Holiday and Leisure, Art, Comedy, Drama, Hospitality, Aviation, Entertainment, Gaming, Information and Technology, Banking, Crime and Justice, Health, Insurance, Religion, Study, Retail, Technology, Weather, Politics, Spirituality, Travel..

Resource

Ratio of male to female, race, geographic location, age group

Measurement Units

Total audio duration, effective audio duration, number of items (by Seconds, minutes, hours, sentences, utterances)

Collection Devices

mobile phones, recording studios, laptop, etc.

It is needed to have the white list for applicable device (like type and configuration requirements)

Collection Software

Provided by client or by OSDT, web-based link or Application to install

Collection Scenarios

Indoor or outdoor, inside or outside a car, window open or closed, office environment, meeting environment

Collection Rules

Single-person, multi-person conversations, speech speed, tone

Repeats, re-starts, filler words, etc., should/shouldn’t all be present, use authentic voice and style.

PII

Do not share any real personally identifiable information during the recording (e.g. real names, addresses, phone numbers, etc.)

Collection Distance

Close-range headphones, close-range earpieces, long-range, etc.

Audio Quality

 

1. Background noise requirements (SNR, db)
2. Sampling rate (16kHz, 24kHz, 44kHz, 48kHz), the hardware will be the limit, not the software.

3. Bit depth (16 bits, 24 bits, 32 bits)
3. Channels (mono or stereo)
4. File format (wav, mp3...)
5. File name rules

Acceptance

Frequency: by batch or one-time final acceptance.
Review ratio: full review or random sampling by the client.

Project Experience