PHP: How to parse out HTML comments


Method 1: Using preg_match_all function:

preg_match_all("/<!--(.|\s)*?-->/", $html, $matches);

Example:

<?php

$html = <<<HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--
<p>! Athletic therapy - practitioners are highly skilled health
-->
<br />
<p>Athletic therapy practitioners are highly skilled health care professionals, with similar scope of practice as physiotherapists that provide immediate treatment to musculoskeletal injuries. ATs employ a sports medicine model of rehabilitation to physical injuries incurred from sports, recreation, accidents, daily activities or occupation. Early exercise prescription is often given to aggressively heal soft tissue injuries and to maintain/increase mobility.</p>
<p>The treatments offered are always one on one and usually 30-60 minutes in length. Treatments consist of manual therapy, including soft tissue therapy &amp; joint mobilization, core strengthening &amp; therapeutic exercise prescription, supportive taping &amp; bracing, postural correction, proprioceptive neuromuscular facilitation, neuromuscular retraining, nutritional advice &amp; supplement recommendation, and the use of traditional modalities (ultrasound, IFC, TENS, laser, NMES). Almost every modality available to physiotherapists is also used by these health practitioners.</p>

<p>Many extended health plans cover these treatments with a doctor's referral.</p>
<p>To become a certified athletic therapist, graduates of accredited universities need to successfully pass the athletic therapy board exams administered by International Board of Certified Athletic Therapists (IBCAT). Upon successful completion of the board exams; graduates are permitted to use the titles: CAT, DIBCAT; which stand for Certified Athletic Therapists, Diplomate of the International Board of Certified Athletic Therapists.</p>
<p>With a CAT, DIBCAT title, athletic therapists are permitted to work as athletic therapists everywhere; including the USA (all states), Canada (all provinces), Australia (all provinces), New Zealand, United Kingdom, South Africa, Japan, South Korea, Germany, Mexico, Brazil, India, China, Spain, France, Italy, Latvia, Iran, &amp; the Netherlands.</p>
<p>To learn about how to become an athletic therapist please visit website of the National University of Medical Sciences at <a rel="nofollow" onclick="javascript:_gaq.push(['_trackPageview', '/outgoing/article_exit_link/6435039']);" href="http://www.2tuts.com">http://www.2tuts.com</a>. NUMSS offers a  master of science in athletic therapy which takes one year full time to complete. The program is offered online with optional campus based practical technique classes.</p>
HTML;

if(preg_match_all("/<!--(.|\s)*?-->/", $html, $matches)){
    var_dump($matches);
}

?>

Result:

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(74) "<!--
<p>! Athletic therapy - practitioners are highly skilled health
-->"
  }
  [1]=>
  array(1) {
    [0]=>
    string(1) "
"
  }
}

Method 2:

preg_match_all("/<!--(.*?)-->/s", $html, $matches);

Example:

<?php

$html = <<<HTML
<p>Horse racing is an equestrian sport that has been popular throughout the centuries.</p>
<p>Horse racing is one of the most attended and most enjoyed sports events in the world.</p> 
<!-- Horse racing is one of the most attended and most enjoyed sports events in the world -->
<p>Many people do not realize that there are different types of horse racing.</p> 
<p>The following is a list of some of the various forms of horse racing that are currently practiced throughout the world:</p>          		            
HTML;

if(preg_match_all("/<!--(.*?)-->/s", $html, $matches)){
    var_dump($matches);
}

?>

Result:

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(93) "<!-- Horse racing is one of the most attended and most enjoyed sports events in the world -->"
  }
  [1]=>
  array(1) {
    [0]=>
    string(86) " Horse racing is one of the most attended and most enjoyed sports events in the world "
  }
}

Bug:

<?php

$html = <<<HTML
<p title="Welcome <!-- tutorialspots.com">free tutorial--></p>          		            
HTML;

if(preg_match_all("/<!--(.*?)-->/s", $html, $matches)){
    var_dump($matches);
}

?>

Result:

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(40) "<!-- tutorialspots.com">free tutorial-->"
  }
  [1]=>
  array(1) {
    [0]=>
    string(33) " tutorialspots.com">free tutorial"
  }
}

Leave a Reply