Hello,
I'm trying to extract the video ID in youtube's video URL.
Example:
http://www.youtube.com/watch?v=j4qJ0cKeeka&mode=related&search=something
http://www.youtube.com/watch?v=j4qJ0cKeeka

I will use below code however can't figure out how to match that pattern.

preg_match('???', $url, $matches)

Thanks in advance for any help.

Comments

cooperaj’s picture

This may be incorrect but you should look into sub-patterns. I think this may be close (not tested)

'/v=(\w+)&/'

Thanks
Adam

erinc’s picture

Thanks it works partially. It doesn't match - (minus) character. For example if the video id is 23d23-d232 , it does't match. Do you know how I should modify it? Thanks again.

cooperaj’s picture

Sorry I didn't realise that it may have dashes in. I only went on the url you posted.

You should use instead:

'/v=(.+)&/'

The '\w' pattern would only match (a-z 0-9 _) whereas the '.' pattern matches everything.

Thanks
Adam

erinc’s picture

Yes, it matched dashes now thanks.
I have another small issue now.

$url = "http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search";
$pattern = '/v=(.+)&/';
preg_match($pattern, $url, $matches);
print_r($matches);

With above I get the following array:
Array ( [0] => v=szygGmDsAl4&search_query=asdasda& [1] => szygGmDsAl4&search_query=asdasda )

erinc’s picture

I found a bad trick...

$url = "http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search";
$pattern = '/v=(.+)/';
preg_match($pattern, $url, $matches);
$firstpart = strtok($matches[1],"&");
print $firstpart;

This works, but I would love to do it with just regular expressions.

nicholasthompson’s picture

try...

$url = "http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search";
$pattern = '/v=(.+?)&/';
preg_match($pattern, $url, $matches);
print_r($matches);

You might need to escape the ampersand in the pattern (put a backslash before it)...

The ? means "dont be greedy".

For more infor:
http://www.regular-expressions.info/

erinc’s picture

Thank you, that works perfectly. I need to do some regex reading now :)

erinc’s picture

Hmm, when the $url is just $url = "http://www.youtube.com/watch?v=szygGmDsAl4 (doesn't include a "&"), the query doesn't match anything.

dman’s picture

$pattern = '/v=([^&]+)/';

- matches any string that's NOT an ampersand.
There's any number of ways of attacking it, depending on how many times you want to keep changing the ground rules ;)

A true solution would be to use one of PHPs built-in url-parsing routines. They are designed to be bullet-proof. You'll find examples on php.net in the comments for parse_url and similar.
Regexps can always find ways to be broken - imagine if your source URL was strictly URL-encoded : with %26 for the &. It's valid, but the regexp will miss it.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

Steven’s picture

If the ampersand is urlencoded, then it is still part of the value, and the regexp will do the right thing. You will need to call urldecode() afterwards, just like you need to do with parse_uri().

--
If you have a problem, please search before posting a question.

dman’s picture

Sorry, it wasn't url-encoding I was thinking of, it was xhtml entity-encoding.

<a href=
    "http://www.youtube.com/watch?v=j4qJ0cKeeka&mode=related&search=something">
    encode</a>

is invalid XHTML. If you are running strict, it should be:

<a href=
    "http://www.youtube.com/watch?v=j4qJ0cKeeka&amp;mode=related&amp;search=something">
    encode</a>

XML Sources, eg RSS feeds, should probably be using this syntax if they are pretending to be valid, and not just raw-text-that-happens-to-look-like-html.

I imagine & #38; will also work, but I've encountered inconsistancies when using a dozen different XML processors. Anyway either is a pain to parse with just regexps.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

rkendall’s picture

You should have no problems with this, and perhaps a bit more the PHP way...

$url = 'http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search';
$parsed_url = parse_url($url);
parse_str($parsed_url[query], $parsed_query);
print_r($parsed_query);

The video ID is of course $parsed_query[v]

--
Ross Kendall
UK based Web and IT consultant specialising in Free and Open Source Software technologies.
http://rosskendall.com

dman’s picture

That is indeed MUCH better and accurate.
... when you already have the URL on a string of its own.

You'd choose to use rexexps however when parsing an entire lump of random HTML.

... so you'll need to combine a regexp that locates all links (or embeds?) and then calls back (regexp execute) to this URL-parser.

Exercise for the reader :)

(or use the HTML link-extor or WTF it's called)

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

groundh0g’s picture

Sorry to reply to such an old post, but I just wrote a regex for this very task. If nothing else, it will be good to get feedback from the more regex-savvy folks.

(?:[Yy][Oo][Uu][Tt][Uu][Bb][Ee]\.[Cc][Oo][Mm]/watch\?v=)(\w[\w|-]*)

This version matches the following URLs, only returning the text that I'm interested in (i.e. no "v=").

Of course, it's susceptible to phishing from domains that end with "youtube.com" (like "www.PhishingForYouTube.com"), but I wanted this to support feeds from subdomains of youtube like the ones used for their API pages.

groundh0g’s picture

By the way, I didn't bother to make the "/watch" part case-insensitive because then the link wouldn't be valid. Someone set up a profile under the username "watch". If "watch" is typed using mixed-case letters, you'll visit that profile, regardless of the text that follows it.

d0t101101’s picture

Thanks for this.

A problem with the above regex is that it wont match on youtube URLs like this:
http://youtube.com/watch?v=-j5J7lXav7Y

I've tried quite a few variations, without success. I'll post a solution here once found.

Regards,
.

d0t101101’s picture

Here is a regex that solves the above issue, and properly handles virtually all youtube video urls:

/youtube\.com\/watch\?v=([A-Za-z0-9._%-]*)[&\w;=\+_\-]*/

lord_of_freaks’s picture

Here is my solution

/youtube\.com\/watch\?v=([^&]+)/ie

ReaALFAnTaSy’s picture

'/v=[^&]*/'

noquery’s picture

I use the below RE on my site. And it works perfectly. Its internal part ie sub RE returns me youtube video id while the complete RE returns complete youtube url surrounding with space.

/youtube.com\/watch\?v=(\S*)/