Regular expression question

By erinc on 9 Oct 2006 at 09:51 UTC

Hello,
I'm trying to extract the video ID in youtube's video URL.
Example:
http://www.youtube.com/watch?v=j4qJ0cKeeka&mode=related&search=something
http://www.youtube.com/watch?v=j4qJ0cKeeka

I will use below code however can't figure out how to match that pattern.

preg_match('???', $url, $matches)

Thanks in advance for any help.

Comments

This may be incorrect but

cooperaj commented 9 October 2006 at 10:11

This may be incorrect but you should look into sub-patterns. I think this may be close (not tested)

'/v=(\w+)&/'

Thanks
Adam

Thanks it works partially.

erinc commented 9 October 2006 at 10:23

Thanks it works partially. It doesn't match - (minus) character. For example if the video id is 23d23-d232 , it does't match. Do you know how I should modify it? Thanks again.

Sorry I didn't realise that

cooperaj commented 9 October 2006 at 10:32

Sorry I didn't realise that it may have dashes in. I only went on the url you posted.

You should use instead:

'/v=(.+)&/'

The '\w' pattern would only match (a-z 0-9 _) whereas the '.' pattern matches everything.

Thanks
Adam

Yes, it matched dashes now

erinc commented 9 October 2006 at 10:48

Yes, it matched dashes now thanks.
I have another small issue now.

$url = "http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search";
$pattern = '/v=(.+)&/';
preg_match($pattern, $url, $matches);
print_r($matches);

With above I get the following array:
Array ( [0] => v=szygGmDsAl4&search_query=asdasda& [1] => szygGmDsAl4&search_query=asdasda )

I found a bad

erinc commented 9 October 2006 at 11:16

I found a bad trick...

$url = "http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search";
$pattern = '/v=(.+)/';
preg_match($pattern, $url, $matches);
$firstpart = strtok($matches[1],"&");
print $firstpart;

This works, but I would love to do it with just regular expressions.

try... <?php$url = "http://

nicholasthompson

English

commented 9 October 2006 at 11:27

try...

$url = "http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search";
$pattern = '/v=(.+?)&/';
preg_match($pattern, $url, $matches);
print_r($matches);

You might need to escape the ampersand in the pattern (put a backslash before it)...

The ? means "dont be greedy".

For more infor:
http://www.regular-expressions.info/

Thank you, that works

erinc commented 9 October 2006 at 11:33

Thank you, that works perfectly. I need to do some regex reading now :)

Hmm, when the $url is just

erinc commented 9 October 2006 at 11:44

Hmm, when the $url is just $url = "http://www.youtube.com/watch?v=szygGmDsAl4 (doesn't include a "&"), the query doesn't match anything.

$pattern = '/v=([^&]+)/'; -

dman commented 9 October 2006 at 12:33

$pattern = '/v=([^&]+)/';

- matches any string that's NOT an ampersand.
There's any number of ways of attacking it, depending on how many times you want to keep changing the ground rules ;)

A true solution would be to use one of PHPs built-in url-parsing routines. They are designed to be bullet-proof. You'll find examples on php.net in the comments for parse_url and similar.
Regexps can always find ways to be broken - imagine if your source URL was strictly URL-encoded : with %26 for the &. It's valid, but the regexp will miss it.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

Urlencoding

Steven commented 9 October 2006 at 13:21

If the ampersand is urlencoded, then it is still part of the value, and the regexp will do the right thing. You will need to call urldecode() afterwards, just like you need to do with parse_uri().

--
If you have a problem, please search before posting a question.

Whoops. off-by-one error

dman commented 9 October 2006 at 14:04

Sorry, it wasn't url-encoding I was thinking of, it was xhtml entity-encoding.

<a href=
    "http://www.youtube.com/watch?v=j4qJ0cKeeka&mode=related&search=something">
    encode</a>

is invalid XHTML. If you are running strict, it should be:

<a href=
    "http://www.youtube.com/watch?v=j4qJ0cKeeka&amp;mode=related&amp;search=something">
    encode</a>

XML Sources, eg RSS feeds, should probably be using this syntax if they are pretending to be valid, and not just raw-text-that-happens-to-look-like-html.

I imagine & #38; will also work, but I've encountered inconsistancies when using a dozen different XML processors. Anyway either is a pain to parse with just regexps.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

Without a regular expression

rkendall commented 9 October 2006 at 14:57

You should have no problems with this, and perhaps a bit more the PHP way...

$url = 'http://www.youtube.com/watch?v=szygGmDsAl4&search_query=testing&search=Search';
$parsed_url = parse_url($url);
parse_str($parsed_url[query], $parsed_query);
print_r($parsed_query);

The video ID is of course $parsed_query[v]

--
Ross Kendall
UK based Web and IT consultant specialising in Free and Open Source Software technologies.
http://rosskendall.com

That is indeed MUCH better

dman commented 9 October 2006 at 15:34

That is indeed MUCH better and accurate.
... when you already have the URL on a string of its own.

You'd choose to use rexexps however when parsing an entire lump of random HTML.

... so you'll need to combine a regexp that locates all links (or embeds?) and then calls back (regexp execute) to this URL-parser.

Exercise for the reader :)

(or use the HTML link-extor or WTF it's called)

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

One more way ...

groundh0g commented 28 September 2007 at 01:08

Sorry to reply to such an old post, but I just wrote a regex for this very task. If nothing else, it will be good to get feedback from the more regex-savvy folks.

(?:[Yy][Oo][Uu][Tt][Uu][Bb][Ee]\.[Cc][Oo][Mm]/watch\?v=)(\w[\w|-]*)

This version matches the following URLs, only returning the text that I'm interested in (i.e. no "v=").

Of course, it's susceptible to phishing from domains that end with "youtube.com" (like "www.PhishingForYouTube.com"), but I wanted this to support feeds from subdomains of youtube like the ones used for their API pages.

BTW ...

groundh0g commented 28 September 2007 at 01:13

By the way, I didn't bother to make the "/watch" part case-insensitive because then the link wouldn't be valid. Someone set up a profile under the username "watch". If "watch" is typed using mixed-case letters, you'll visit that profile, regardless of the text that follows it.

Thanks for this. A problem

d0t101101 commented 11 June 2008 at 02:12

Thanks for this.

A problem with the above regex is that it wont match on youtube URLs like this:
http://youtube.com/watch?v=-j5J7lXav7Y

I've tried quite a few variations, without success. I'll post a solution here once found.

Regards,
.

Here is a regex that solves

d0t101101 commented 11 June 2008 at 02:29

Here is a regex that solves the above issue, and properly handles virtually all youtube video urls:

/youtube\.com\/watch\?v=([A-Za-z0-9._%-]*)[&\w;=\+_\-]*/

Another solution

lord_of_freaks commented 6 August 2008 at 11:57

Here is my solution

/youtube\.com\/watch\?v=([^&]+)/ie

here it is

ReaALFAnTaSy commented 2 March 2010 at 11:29

'/v=[^&]*/'

RE that works for me

noquery commented 23 October 2010 at 18:23

I use the below RE on my site. And it works perfectly. Its internal part ie sub RE returns me youtube video id while the complete RE returns complete youtube url surrounding with space.

/youtube.com\/watch\?v=(\S*)/

Regular expression question

Comments

This may be incorrect but

Thanks it works partially.

Sorry I didn't realise that

Yes, it matched dashes now

I found a bad

try... <?php$url = "http://

Thank you, that works

Hmm, when the $url is just

$pattern = '/v=([^&]+)/'; -

Urlencoding

Whoops. off-by-one error

Without a regular expression

That is indeed MUCH better

One more way ...

BTW ...

Thanks for this. A problem

Here is a regex that solves

Another solution

here it is

RE that works for me

New forum topics

News items

Our community

Documentation

Drupal code base

Governance of community