Reading metadata (id3 tag) of *.mp3 files

Discussions about the Liberty BASIC language, with particular reference to LB Booster
flotul
Posts: 17
Joined: Fri Apr 06, 2018 7:06 am

Reading metadata (id3 tag) of *.mp3 files

Post by flotul »

Hi there,

I'm trying to extract some metadata from *.mp3 files such as BMP and Genre information.

I have made a very short test song (attachment) where the Genre is "Pop" and the BPM is "92".

Looking at this audio file with an hex editor, here is what I get:
Image


So, the apparently header for Genre is "TCON" and the one for BMP is "TBPM".

Code: Select all

OPEN DefaultDir$ + "\TestSong.mp3" FOR binary AS #TITLE

FOR i = 0 TO 100
    PRINT INPUT$(#TITLE, 1);
NEXT i
WAIT

This code will give different results when run in LB or LBB.

Here in LB:
Image


Here in LBB:
Image


I'm always using LBB but I get more usable results with LB because LBB won't find the Genre or "TCON" header.

Is there a better way to extract this type of data with LBB please?
Attachments
TestSong.zip
(62.66 KiB) Downloaded 1435 times
guest
Site Admin
Posts: 227
Joined: Tue Apr 03, 2018 1:34 pm

Re: Reading metadata (id3 tag) of *.mp3 files

Post by guest »

flotul wrote: Wed Feb 12, 2025 6:06 pm This code will give different results when run in LB or LBB.
You can see from your hex dump that the text you are reading from the MP3 file is encoded in Unicode (UTF-16) format. Indeed there is an explicit UTF-16 BOM (Byte Order Mark) FF FE immediately preceding the text.

If you make allowance in your code for it being UTF-16, you should find that LB and LBB behave the same way (LBB has built-in support for UTF-8 but not for UTF-16; LB4 supports neither).
Is there a better way to extract this type of data with LBB please?
There's nothing wrong with the method you have used. Your mistake is that you expected the text to be in ASCII/ANSI format but it isn't. I haven't looked at the MP3 specification but it may be that other text encodings are supported, in which case your program would need to be able to adapt to those too.

How much effort you put into decoding UTF-16 will depend on whether your program needs to support accented characters, foreign alphabets (e.g. Cyrillic, Greek), right-to-left printing languages (e.g. Hebrew) and/or complex scripts (e.g. Arabic). Unicode text handling is a very complicated subject!

One approach you could consider is using the Windows WideCharToMultiByte API function to convert the UTF-16 text to UTF-8, and then use LBB's built-in support for UTF-8 to print it out.
flotul
Posts: 17
Joined: Fri Apr 06, 2018 7:06 am

Re: Reading metadata (id3 tag) of *.mp3 files

Post by flotul »

Thanks a lot, who ever you are 👍

I didn't expect to face that kind of difficulty which is far over my basic knowledge.

Maybe I have to make another approach of what I'm aiming to do: a listview of mp3 file, three columns: filename, genre and bpm and sort them.

I'll have a look at the API but my inexperience there will probably keep me away from that solution.

Anyway, thanks a lot again 😉
guest
Site Admin
Posts: 227
Joined: Tue Apr 03, 2018 1:34 pm

Re: Reading metadata (id3 tag) of *.mp3 files

Post by guest »

flotul wrote: Thu Feb 13, 2025 8:40 am I'll have a look at the API but my inexperience there will probably keep me away from that solution.
Fair enough. The API is the best approach if you want to retain accents and foreign-language characters, and you should have no trouble finding existing Liberty BASIC code to call it that you can copy. But if you don't, just do a crude UTF-16 to UTF-8 (or UTF-16 to ASCII) conversion yourself in BASIC code.

I don't know what part of the world you are from, but dealing with international character sets is commonplace in most regions - but sadly not in the USA where it can be something of a culture shock. :D
Rod
Posts: 17
Joined: Fri Apr 06, 2018 7:00 am

Re: Reading metadata (id3 tag) of *.mp3 files

Post by Rod »

While I have not got LBB loaded on my current PC I have coded this. Not sure I understand why there would be a difference running under LBB. The issue for me is that while the file may be UTF encoded the data bytes are still bytes. So the tags and size bytes are just normal. The issue is in reading text which as Richard has clarified has the FFFE marker. So looking at the mp3 file we see that the text is using double characters to define a single character.

This code seems to extract what you want though I have fudged the TCON and TBPM because I don't yet fully understand the encoding for those tags, but they can be found.

Still some work to do but it may help you on the way. (It may be that we need to handle FEFF as well as FFFE, easy enough to skip through in a different order)

I do understand that this is the easy utf unencoding other encodings are as Richard points out, too complex.

filedialog "Open media file", "*.mp3", fileName$
open fileName$ for input as #title
l=lof(#title)
s$=input$(#title,1028)
t=instr(s$,"TIT2")
p=instr(s$,"TPE1")
c=instr(s$,"TCON")
b=instr(s$,"TBPM")
if p>0 then
'find the length of the performer's name
l=asc(mid$(s$,p+7,1))+asc(mid$(s$,p+6,1))*256+asc(mid$(s$,p+5,1))*65536+asc(mid$(s$,p+4,1))*16777216
perfo$=unitoasc$(mid$(s$,p+11,l-1))
else
perfo$="Unknown"
end if
if t>0 then
l=asc(mid$(s$,t+7,1))+asc(mid$(s$,t+6,1))*256+asc(mid$(s$,t+5,1))*65536+asc(mid$(s$,t+4,1))*16777216
title$=unitoasc$(mid$(s$,t+11,l-1))
else
title$="Unknown"
end if

if c>0 then
content$=unitoasc$(mid$(s$,c+11,12))
else
content$="Unknown"
end if

if b>0 then
bpm$=unitoasc$(mid$(s$,b+11,3))
else
bpm$="Unknown"
end if

close #title
print perfo$
print title$
print content$
print bpm$


m$=GetShortPathName$(fileName$)

'open song
r$=mciSendString$("open "+m$+" alias song")
'r$=mciSendString$("open "+m$+" type MpegVideo alias song")

'set song volume
vol=500

'get song length
songlength = VAL(mciSendString$("status song length"))
min=int(songlength/1000/60)
sec=int(songlength/1000-min*60)
songmin$=right$("00"+str$(min),2)
songsec$=right$("00"+str$(sec),2)
'play song
r$=mciSendString$("setaudio song volume to ";vol)
r$=mciSendString$("play song")
wait

function GetShortPathName$(lPath$)
lPath$=lPath$+chr$(0)
sPath$=space$(256)
lenPath=len(sPath$)
calldll #kernel32, "GetShortPathNameA",lPath$ as ptr,_
sPath$ as ptr,lenPath as long,r as long
GetShortPathName$=left$(sPath$,r)
end function

function mciSendString$(s$)
buffer$=space$(1024)+chr$(0)
calldll #winmm,"mciSendStringA",s$ as ptr,buffer$ as ptr,_
1028 as long, 0 as long, r as long
buffer$=trim$(buffer$)
if r>0 then
buffer2$=space$(129)
calldll #winmm,"mciGetErrorStringA", r as long, buffer2$ as ptr,_
128 as ulong, r as boolean
mciSendString$=buffer2$
else
mciSendString$=buffer$
end if
end Function

function unitoasc$(u$)
if left$(u$,2)=chr$(hexdec("FF"))+chr$(hexdec("FE")) then
'step through the double unicode extracting the single asc
for n=3 to len(u$) step 2
unitoasc$=unitoasc$+mid$(u$,n,1)
if mid$(u$,n,1)=chr$(0) then exit for
next
else
unitoasc$=u$
end if
end function
Rod
Posts: 17
Joined: Fri Apr 06, 2018 7:00 am

Re: Reading metadata (id3 tag) of *.mp3 files

Post by Rod »

Oh, I see that LBB displays the UTF as a single character stream. I wonder then if once fetched it is a normal asc stream?

Edit, no, of course not, it stays as a UTF string until we change it. But in LBB it will display normally.
guest
Site Admin
Posts: 227
Joined: Tue Apr 03, 2018 1:34 pm

Re: Reading metadata (id3 tag) of *.mp3 files

Post by guest »

Rod wrote: Sun Feb 23, 2025 3:49 pm Edit, no, of course not, it stays as a UTF string until we change it. But in LBB it will display normally.
"UTF string" is ambiguous. You need to specify whether it's UTF-8, UTF-16LE, UTF-16BE, UTF-32LE or UTF-32BE.

BBC BASIC (which is what LBB is using under the hood) uses UTF-8 internally, so the other formats need to be converted to UTF-8 if you want them to be displayed correctly.

On Windows, you can use the WideCharToMultiByte API function for that.