Vertically Segmented Target Zone Based Audio Fingerprinting

Computer Engineering Project Topics

Get the Complete Project Materials Now! ยป

An audio fingerprint is a set of perceptual features that uniquely identify an audio file.rnAudio fingerprinting has applications in broadcast monitoring, meta-data collection,rnroyalty tracking, etc. Audio fingerprinting systems suffer a lot from noise, compression,rnand modifications present in the audio. Pitch shifting is one such audio modification.rnCommon real-world scenarios where pitch-shifting occurs include radio broadcasts,rnDJ sets, and deliberate alterations. Since pitch-shifting scales the spectral contentrnof the original audio, matching pitch-shifted query audio to its original unmodifiedrnversion is challenging. This thesis work proposes a Shazam-based audio fingerprintingrnsystem resistant to pitch-shifting.rnThe proposed approach uses CQT to transform the scaling effect of pitch-shifting intornvertical translation. From the spectrogram generated by CQT, the proposed approachrnpicks triple spectral peaks to encode pitch-shifting resistant fingerprint hashes. Verticallyrnsegmented target zones were employed to organize spectral peaks into triplets.rnBy increasing the locality of the generated fingerprint hashes, vertically segmentingrnthe spectrogram minimizes the effect of pitch-shifting. A fingerprint hashing schemernthat leverages vertically segmented target zones is proposed.rnA total of 42,000 query audio and a reference database of 3000 freely available songsrnwere used to evaluate the proposed approach as well as the chosen baseline works:rnPanako and Quad. The result collected shows that the proposed approach handlesrnpitch-shift modifications from -11% to +12% except for modification values of -8, -3,rn+3, and +9 percent. Panako achieved to identify queries with -6% to +6% pitch shiftsrnexcept for modification values of -3 and +3 percent. Quad, on the other hand, can handlern-12% to +7% pitch shifts with no such drops. The proposed approach is also robustrnto linear speed modification from -6% to +12%, which is a significant improvementrnover Panako, which can only handle -4% to +8% modifications. Quad showed betterrnrobustness to linear speed modification by handling rates ranging from -16% to +12%.rnHowever, Quad took, on average, 3 times more time to query a single audio than thernproposed approach. Moreover, the proposed approach shows robustness to commonrnaudio effects such as echo, tremolo, flanger, band-pass, and chorus while Quad sufferedrnsignificant accuracy drop for chorus, flanger and tremolo.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Vertically Segmented Target Zone Based Audio Fingerprinting

108