Página 3 de 3

Re: Javascript emulator?

Publicado: 26 Ago 2015, 13:46
por zanzibar1982

Re: Javascript emulator?

Publicado: 26 Ago 2015, 21:34
por robalo
Ok :)

He visto que al fallar

Código: Seleccionar todo

    action = "findvideos"
    if "serie-tv" in item.url: action = "episodios"
habéis duplicado la función 'fichas' para las series y modificado la función 'search' con una entrada más para llamar a 'fichas' o al duplicado para las series 'serietv'. No esta mal pero con unos pequeños cambios se puede "reutilizar" o no duplicar el código.

Te pego el código con el que hago las pruebas a modo informativo no correctivo :)

Código: Seleccionar todo

# -*- coding: utf-8 -*-
#------------------------------------------------------------
# pelisalacarta - XBMC Plugin
# Canal para itafilmtv
# http://blog.tvalacarta.info/plugin-xbmc/pelisalacarta/
#------------------------------------------------------------
import urlparse,urllib2,urllib,re
import os, sys

from core import scrapertools
from core import logger
from core import config
from core.item import Item
from servers import servertools

__channel__ = "itafilmtv"
__category__ = "F,S"
__type__ = "generic"
__title__ = "ITA Film TV"
__language__ = "IT"

host = "http://www.itafilm.tv"

headers = [
    ['Host','www.itafilm.tv'],
    ['User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0'],
    ['Accept-Encoding','gzip, deflate'],
    ['Cookie','_ddn_intercept_2_=b33473ad0b70b320a9f7546e213a396a']
]

def isGeneric():
    return True

def mainlist( item ):
    logger.info( "[itafilmtv.py] mainlist" )

    itemlist = []

    itemlist.append( Item( channel=__channel__, action="fichas", title="Home", url=host ) )
    itemlist.append( Item( channel=__channel__, action="fichas", title="Serie TV", url=host + "/telefilm-serie-tv-streaming/" ) )
    itemlist.append( Item( channel=__channel__, action="search", title="Buscar...", url=host ) )

    return itemlist

## Al llamarse "search" la función, el launcher pide un texto a buscar y lo añade como parámetro
def search( item, texto ):
    logger.info( "[itafilmtv.py] " + item.url + " search " + texto )

    item.url+= "/?do=search&subaction=search&search_start=1&story=" + texto

    try:
        return fichas( item )

    ## Se captura la excepción, para no interrumpir al buscador global si un canal falla
    except:
        import sys
        for line in sys.exc_info():
            logger.error( "%s" % line )
        return []

def fichas( item ):
    logger.info( "[itafilmtv.py] fichas" )

    itemlist = []

    ## Descarga la página
    data = scrapertools.cache_page( item.url, headers=headers )

    if "do=search" in item.url:
        search_pages = re.compile( 'javascript:list_submit.(\d+).', re.DOTALL ).findall( data )
        for next_search_page in range( 2, len( search_pages ) + 2 ):
            item.url = re.sub( r'search_start=(\d+)', 'search_start=%s' % next_search_page, item.url)
            data+= scrapertools.cache_page( item.url, headers=headers )

    ## Extrae las datos
    patron = '<div class="main-news">.*?'
    patron+= '<div class="main-news-image"[^<]+'
    patron += '<a href="([^"]+)">'
    patron += '<img src="([^"]+)" '
    patron += 'alt="([^"]+)"'

    matches = re.compile( patron, re.DOTALL ).findall( data )

    for scrapedurl, scrapedthumbnail, scrapedtitle in matches:

        action = "findvideos"
        if "(serie tv)" in scrapedtitle.lower(): action = "episodios"

        scrapedtitle = scrapertools.decodeHtmlentities( scrapedtitle )

        itemlist.append( Item( channel=__channel__, action=action, title=scrapedtitle, url=scrapedurl, thumbnail=urlparse.urljoin( host, scrapedthumbnail ), fulltitle=scrapedtitle, show=scrapedtitle ) )

    ## Paginación
    next_page = scrapertools.find_single_match( data, '<span>\d+</span> <a href="([^"]+)">' )
    if next_page != "":
        itemlist.append( Item( channel=__channel__, action="fichas" , title=">> Página siguiente" , url=next_page ) )

    return itemlist

def episodios( item ):
    logger.info( "[itafilmtv.py] episodios" )

    itemlist = []

    ## Descarga la página
    data = scrapertools.cache_page( item.url, headers=headers )

    plot = scrapertools.htmlclean(
        scrapertools.get_match( data, '<div class="main-news-text main-news-text2">(.*?)</div>' )
    ).strip()

    ## Extrae las datos - Episodios
    patron = '<br />(\d+x\d+).*?href="//ads.ad-center.com/[^<]+</a>(.*?)<a href="//ads.ad-center.com/[^<]+</a>'
    matches = re.compile( patron, re.DOTALL ).findall( data )
    if len( matches ) == 0:
        patron = ' />(\d+x\d+)(.*?)<br'
        matches = re.compile( patron, re.DOTALL ).findall( data )

    print "##### episodios matches ## %s ##" % matches

    ## Extrae las datos - sub ITA/ITA
    patron = '<b>.*?STAGIONE.*?(sub|ITA).*?</b>'
    lang = re.compile( patron, re.IGNORECASE ).findall( data )

    lang_index = 0
    for scrapedepisode, scrapedurls in matches:

        if int( scrapertools.get_match( scrapedepisode, '\d+x(\d+)' ) ) == 1:
            lang_title = lang[lang_index]
            if lang_title.lower() == "sub": lang_title+= " ITA"
            lang_index+= 1

        title = scrapedepisode + " - " + item.show + " (" + lang_title + ")"
        scrapedurls = scrapedurls.replace( "playreplay", "moevideo" )

        matches_urls = re.compile( 'href="([^"]+)"', re.DOTALL ).findall( scrapedurls )
        urls = ""
        for url in matches_urls:
            urls+= url + "|"

        if urls != "":
            itemlist.append( Item( channel=__channel__, action="findvideos", title=title, url=urls[:-1], thumbnail=item.thumbnail, plot=plot, fulltitle=item.fulltitle, show=item.show ) )

    return itemlist

def findvideos( item ):
    logger.info( "[itafilmtv.py] findvideos" )

    itemlist = []

    ## Extrae las datos
    if "|" not in item.url:
        ## Descarga la página
        data = scrapertools.cache_page( item.url, headers=headers )

        sources = scrapertools.get_match( data, '(<noindex> <div class="video-player-plugin">.*?</noindex>)')

        patron = 'src="([^"]+)"'
        matches = re.compile( patron, re.DOTALL ).findall( sources )
    else:
        matches = item.url.split( '|' )

    for scrapedurl in matches:

        server = scrapedurl.split( '/' )[2].split( '.' )
        if len(server) == 3: server = server[1]
        else: server = server[0]

        title = "[" + server + "] " + item.fulltitle

        itemlist.append( Item( channel=__channel__, action="play" , title=title, url=scrapedurl, thumbnail=item.thumbnail, fulltitle=item.fulltitle, show=item.show, folder=False ) )

    return itemlist

def play( item ):
    logger.info( "[itafilmtv.py] play" )

    ## Sólo es necesario la url
    data = item.url

    itemlist = servertools.find_video_items( data=data )

    for videoitem in itemlist:
        videoitem.title = item.show
        videoitem.fulltitle = item.fulltitle
        videoitem.thumbnail = item.thumbnail
        videoitem.channel = __channel__

    return itemlist
También le he añadido tres líeas a '## Extrae las datos - Episodios' para las series que no tienen enlaces publicitarios como la serie 'Scandal'.

Re: Javascript emulator?

Publicado: 27 Ago 2015, 01:35
por zanzibar1982
Magic robalo :D

updated the itafilm.tv channel, thak you a lot!

Talking about Cloudflare (wonder if it's the case to make a thread for that),

http://itastreaming.co/ and this other site http://altadefinizione.click/

are using it; user fenice82 is trying to develop his first channel for

altadefinizione.clik and he's having hard times I guess.

Could this be of any help to our cause?

https://github.com/Anorov/cloudflare-scrape

Anyways, as Jesus already stated, if a website gives too much problems the best choice is to leave it alone.

Re: Javascript emulator?

Publicado: 27 Ago 2015, 08:52
por robalo
Estoy deacuerdo con jesus pero estos canales nos pueden servir como ejercicio.

He contestado a fenice82 en viewtopic.php?f=23&t=6930

Re: Javascript emulator?

Publicado: 27 Ago 2015, 15:58
por DrZ3r0
Hi,
The channel itastreaming.co has function anti_cloudflare.
Just use this function in other channels to overcome cloudflare.

Re: Javascript emulator?

Publicado: 27 Ago 2015, 16:09
por zanzibar1982
DrZ3r0 escribió:Hi,
The channel itastreaming.co has function anti_cloudflare.
Just use this function in other channels to overcome cloudflare.
For some reasons itastreaming.co channel is not working on android device,
returns error as soon as you select it.

I'll post a log ASAIC.