|
做過好多抓取別家網(wǎng)站內(nèi)容的產(chǎn)品,習(xí)慣了使用方便快捷的file_get_contents函數(shù),但是總是會(huì)遇到獲取失敗的問題,盡管按照手冊中的例子設(shè)置了超時(shí),可多數(shù)時(shí)候不會(huì)奏效:
復(fù)制代碼 代碼如下:
$config['context'] = stream_context_create(array(‘http' => array(‘method' => “GET”,
'timeout' => 5//這個(gè)超時(shí)時(shí)間不穩(wěn)定,經(jīng)常不奏效
)
));
這時(shí)候,看一下服務(wù)器的連接池,會(huì)發(fā)現(xiàn)一堆類似的錯(cuò)誤,讓你頭疼萬分:
file_get_contents(http://***): failed to open stream…
不得已,安裝了curl庫,寫了一個(gè)函數(shù)替換:
復(fù)制代碼 代碼如下:
<span style="color:#000000; font-weight:bold">function</span> curl_file_get_contents<span style="color:#009900">(</span><span style="color:#000088">$durl</span><span style="color:#009900">)</span><span style="color:#009900">{</span>
<span style="color:#000088">$ch</span> <span style="color:#339933">=</span> <span style="color:#990000">curl_init</span><span style="color:#009900">(</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_URL<span style="color:#339933">,</span> <span style="color:#000088">$durl</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_TIMEOUT<span style="color:#339933">,</span> <span style="color:#cc66cc">5</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_USERAGENT<span style="color:#339933">,</span> _USERAGENT_<span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_REFERER<span style="color:#339933">,</span>_REFERER_<span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_RETURNTRANSFER<span style="color:#339933">,</span> <span style="color:#cc66cc">1</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#000088">$r</span> <span style="color:#339933">=</span> <span style="color:#990000">curl_exec</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#990000">curl_close</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
<span style="color:#b1b100">return</span> <span style="color:#000088">$r</span><span style="color:#339933">;</span>
<span style="color:#009900">}</span>
如此,除了真正的網(wǎng)絡(luò)問題外,沒再出現(xiàn)任何問題。
這是別人做過的關(guān)于curl和file_get_contents的測試:
file_get_contents抓取google.com需用秒數(shù):
2.31319094
2.30374217
2.21512604
3.30553889
2.30124092
curl使用的時(shí)間:
0.68719101
0.64675593
0.64326
0.81983113
0.63956594
差距很大吧?呵呵,從我使用的經(jīng)驗(yàn)來說,這兩個(gè)工具不只是速度有差異,穩(wěn)定性也相差很大。建議對網(wǎng)絡(luò)數(shù)據(jù)抓取穩(wěn)定性要求比較高的朋友使用上面的curl_file_get_contents函數(shù),不但穩(wěn)定速度快,還能假冒瀏覽器欺騙目標(biāo)地址哦!
php技術(shù):探討file_get_contents與curl效率及穩(wěn)定性的分析,轉(zhuǎn)載需保留來源!
鄭重聲明:本文版權(quán)歸原作者所有,轉(zhuǎn)載文章僅為傳播更多信息之目的,如作者信息標(biāo)記有誤,請第一時(shí)間聯(lián)系我們修改或刪除,多謝。