By erinc on
Hello,
I'm trying to extract the video ID in youtube's video URL.
Example:
http://www.youtube.com/watch?v=j4qJ0cKeeka&mode=related&search=something
http://www.youtube.com/watch?v=j4qJ0cKeeka
I will use below code however can't figure out how to match that pattern.
preg_match('???', $url, $matches)
Thanks in advance for any help.
Comments
This may be incorrect but
This may be incorrect but you should look into sub-patterns. I think this may be close (not tested)
'/v=(\w+)&/'
Thanks
Adam
Thanks it works partially.
Thanks it works partially. It doesn't match - (minus) character. For example if the video id is 23d23-d232 , it does't match. Do you know how I should modify it? Thanks again.
Sorry I didn't realise that
Sorry I didn't realise that it may have dashes in. I only went on the url you posted.
You should use instead:
'/v=(.+)&/'The '\w' pattern would only match (a-z 0-9 _) whereas the '.' pattern matches everything.
Thanks
Adam
Yes, it matched dashes now
Yes, it matched dashes now thanks.
I have another small issue now.
With above I get the following array:
Array ( [0] => v=szygGmDsAl4&search_query=asdasda& [1] => szygGmDsAl4&search_query=asdasda )
I found a bad
I found a bad trick...
This works, but I would love to do it with just regular expressions.
try... <?php$url = "http://
try...
You might need to escape the ampersand in the pattern (put a backslash before it)...
The ? means "dont be greedy".
For more infor:
http://www.regular-expressions.info/
Thank you, that works
Thank you, that works perfectly. I need to do some regex reading now :)
Hmm, when the $url is just
Hmm, when the $url is just $url = "http://www.youtube.com/watch?v=szygGmDsAl4 (doesn't include a "&"), the query doesn't match anything.
$pattern = '/v=([^&]+)/'; -
- matches any string that's NOT an ampersand.
There's any number of ways of attacking it, depending on how many times you want to keep changing the ground rules ;)
A true solution would be to use one of PHPs built-in url-parsing routines. They are designed to be bullet-proof. You'll find examples on php.net in the comments for parse_url and similar.
Regexps can always find ways to be broken - imagine if your source URL was strictly URL-encoded : with %26 for the &. It's valid, but the regexp will miss it.
.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/
.dan. is the New Zealand Drupal Developer working on Government Web Standards
Urlencoding
If the ampersand is urlencoded, then it is still part of the value, and the regexp will do the right thing. You will need to call urldecode() afterwards, just like you need to do with parse_uri().
--
If you have a problem, please search before posting a question.
Whoops. off-by-one error
Sorry, it wasn't url-encoding I was thinking of, it was xhtml entity-encoding.
is invalid XHTML. If you are running strict, it should be:
XML Sources, eg RSS feeds, should probably be using this syntax if they are pretending to be valid, and not just raw-text-that-happens-to-look-like-html.
I imagine
& #38;will also work, but I've encountered inconsistancies when using a dozen different XML processors. Anyway either is a pain to parse with just regexps..dan.
How to troubleshoot Drupal | http://www.coders.co.nz/
.dan. is the New Zealand Drupal Developer working on Government Web Standards
Without a regular expression
You should have no problems with this, and perhaps a bit more the PHP way...
The video ID is of course
$parsed_query[v]--
Ross Kendall
UK based Web and IT consultant specialising in Free and Open Source Software technologies.
http://rosskendall.com
That is indeed MUCH better
That is indeed MUCH better and accurate.
... when you already have the URL on a string of its own.
You'd choose to use rexexps however when parsing an entire lump of random HTML.
... so you'll need to combine a regexp that locates all links (or embeds?) and then calls back (regexp execute) to this URL-parser.
Exercise for the reader :)
(or use the HTML link-extor or WTF it's called)
.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/
.dan. is the New Zealand Drupal Developer working on Government Web Standards
One more way ...
Sorry to reply to such an old post, but I just wrote a regex for this very task. If nothing else, it will be good to get feedback from the more regex-savvy folks.
(?:[Yy][Oo][Uu][Tt][Uu][Bb][Ee]\.[Cc][Oo][Mm]/watch\?v=)(\w[\w|-]*)This version matches the following URLs, only returning the text that I'm interested in (i.e. no "v=").
Of course, it's susceptible to phishing from domains that end with "youtube.com" (like "www.PhishingForYouTube.com"), but I wanted this to support feeds from subdomains of youtube like the ones used for their API pages.
BTW ...
By the way, I didn't bother to make the "/watch" part case-insensitive because then the link wouldn't be valid. Someone set up a profile under the username "watch". If "watch" is typed using mixed-case letters, you'll visit that profile, regardless of the text that follows it.
Thanks for this. A problem
Thanks for this.
A problem with the above regex is that it wont match on youtube URLs like this:
http://youtube.com/watch?v=-j5J7lXav7Y
I've tried quite a few variations, without success. I'll post a solution here once found.
Regards,
.
Here is a regex that solves
Here is a regex that solves the above issue, and properly handles virtually all youtube video urls:
/youtube\.com\/watch\?v=([A-Za-z0-9._%-]*)[&\w;=\+_\-]*/
Another solution
Here is my solution
/youtube\.com\/watch\?v=([^&]+)/ie
here it is
'/v=[^&]*/'
RE that works for me
I use the below RE on my site. And it works perfectly. Its internal part ie sub RE returns me youtube video id while the complete RE returns complete youtube url surrounding with space.
/youtube.com\/watch\?v=(\S*)/