Página 1 de 3

Javascript emulator?

Publicado: 14 Ago 2015, 15:52
por zanzibar1982
Hola,

in order to download the page, http://www.itafilm.tv/ needs javascript to be enabled.

Because of this I can not extract data from it.

Is there a way to fake/emulate javascript so that the page allows pelisalacarta access?

Re: Javascript emulator?

Publicado: 14 Ago 2015, 18:50
por robalo
Hola zanzibar

El que el sitio web te entregue "You have to turn on javascript and cookies support in browser to visit this site." no significa que python deba emular/interpretar JS.

Vuelvo hacer incapié en "Mi recomendación para crear un canal es empezar de la forma más simple y observar el log de kodi y si hace falta, comprobar que se escriben las cookies en 'cookies.dat'".

Te paso código funcional con lo que podrás trabajar y personalizar. Si no te funciona veras que la ya te está diciendo que tienes que colocar en 'Cookie'. Sólo tienes que extraer el contenido con "document.cookie='([^;]+);" y colocarlo en 'Cookie' del 'headers'

Código: Seleccionar todo

# -*- coding: utf-8 -*-
#------------------------------------------------------------
# pelisalacarta - XBMC Plugin
# Canal para itafilmtv
# http://blog.tvalacarta.info/plugin-xbmc/pelisalacarta/
#------------------------------------------------------------
import urlparse,urllib2,urllib,re
import os, sys

from core import scrapertools
from core import logger
from core import config
from core.item import Item
from servers import servertools

__channel__ = "itafilmtv"
__category__ = "F,S"
__type__ = "generic"
__title__ = "ITA Film TV"
__language__ = "IT"

host = "http://www.itafilm.tv"

headers = [
    ['Host','www.itafilm.tv'],
    ['User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0'],
    ['Accept-Encoding','gzip, deflate'],
    ['Cookie','_ddn_intercept_2_=b33473ad0b70b320a9f7546e213a396a']
]

def isGeneric():
    return True

def mainlist( item ):
    logger.info( "[itafilmtv.py] mainlist" )

    itemlist = []

    itemlist.append( Item( channel=__channel__, action="fichas", title="Home", url=host ) )
    itemlist.append( Item( channel=__channel__, action="search", title="Buscar...", url=host ) )

    return itemlist

## Al llamarse "search" la función, el launcher pide un texto a buscar y lo añade como parámetro
def search( item, texto ):
    logger.info( "[itafilmtv.py] " + item.url + " search " + texto )

    item.url+= "/?do=search&subaction=search&story=" + texto

    try:
        return fichas( item )

    ## Se captura la excepción, para no interrumpir al buscador global si un canal falla
    except:
        import sys
        for line in sys.exc_info():
            logger.error( "%s" % line )
        return []

def fichas( item ):
    logger.info( "[itafilmtv.py] mainlist" )

    itemlist = []

    ## Descarga la página
    data = scrapertools.cache_page( item.url, headers=headers )

    ## Extrae las datos
    patron = '<div class="main-news">.*?'
    patron+= '<div class="main-news-image"[^<]+'
    patron += '<a href="([^"]+)">'
    patron += '<img src="([^"]+)" '
    patron += 'alt="([^"]+)"'

    matches = re.compile( patron, re.DOTALL ).findall( data )

    for scrapedurl, scrapedthumbnail, scrapedtitle in matches:

        scrapedtitle = scrapertools.decodeHtmlentities( scrapedtitle )

        itemlist.append( Item( channel=__channel__, action="findvideos" , title=scrapedtitle, url=scrapedurl, thumbnail=urlparse.urljoin( host, scrapedthumbnail ), fulltitle=scrapedtitle, show=scrapedtitle ) )

    ## Paginación
    next_page = scrapertools.find_single_match( data, '<span>\d+</span> <a href="([^"]+)">' )
    if next_page != "":
        itemlist.append( Item( channel=__channel__, action="fichas" , title=">> Página siguiente" , url=next_page ) )

    return itemlist

def findvideos( item ):
    logger.info( "[itafilmtv.py] findvideos" )

    itemlist = []

    ## Descarga la página
    data = scrapertools.cache_page( item.url, headers=headers )

    sources = scrapertools.get_match( data, '(<noindex> <div class="video-player-plugin">.*?</noindex>)')

    ## Extrae las datos
    patron = 'src="([^"]+)"'

    matches = re.compile( patron, re.DOTALL ).findall( sources )

    for scrapedurl in matches:

        server = scrapedurl.split( '/' )[2].split( '.' )
        if len(server) == 3: server = server[1]
        else: server = server[0]

        title = "[" + server + "] " + item.fulltitle

        itemlist.append( Item( channel=__channel__, action="play" , title=title, url=scrapedurl, thumbnail=item.thumbnail, fulltitle=item.fulltitle, show=item.show, folder=False ) )

    return itemlist

def play( item ):
    logger.info( "[itafilmtv.py] play" )

    ## Sólo es necesario la url
    data = item.url

    itemlist = servertools.find_video_items( data=data )

    for videoitem in itemlist:
        videoitem.title = item.show
        videoitem.fulltitle = item.fulltitle
        videoitem.thumbnail = item.thumbnail
        videoitem.channel = __channel__

    return itemlist

Re: Javascript emulator?

Publicado: 14 Ago 2015, 21:13
por robalo
Como voy a estar unos días fuera y por si surge la pregunta de '¿Cómo sulucionar los enlaces de videomega que entrega este canal? ' Pego lo que yo uso en el conector 'videomega'

Código: Seleccionar todo

## Encuentra vídeos del servidor en el texto pasado
def find_videos( data ):

    encontrados = set()
    devuelve = []

    # http://videomega.tv/iframe.php?ref=....
    # http://videomega.tv/cdn.php?ref=....
    # http://videomega.tv/?ref=....
    # http://videomega.tv/view.php?ref=....

    patterns = [
        'videomega[^/]+/[^=]+=([[A-Za-z0-9]+)'
    ]

    for pattern in patterns:

        logger.info( "[videomega.py] find_videos #" + pattern + "#" )
        matches = re.compile( pattern, re.DOTALL ).findall( data )

        for match in matches:
            titulo = "[videomega.tv]"
            if len(match) > 10: match = x2c( match )
            url = "http://videomega.tv/?ref=" + match
            if url not in encontrados:
                logger.info( "  url=" + url )
                devuelve.append( [ titulo, url, 'videomega' ] )
                encontrados.add( url )
            else:
                logger.info( "  url duplicada=" + url )

    return devuelve

def x2c( data ):
    x = 0; c = ""
    while x < 30:
        c+= chr( int( data[x:x+3] ) )
        x+= 3
    return c

Re: Javascript emulator?

Publicado: 14 Ago 2015, 21:21
por zanzibar1982
Hola robalo,

I checked \Kodi\userdata\addon_data\plugin.video.pelisalacarta\cookies.dat file
but there are no infos regarding itafilm.tv ... or am I looking at the wrong file?

Código: Seleccionar todo

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file!  Do not edit.

.altadefinizione01.com	TRUE	/	FALSE	1470761482	__cfduid	da377764b689f20e893f51c6e9f4df3231439225482
.asiansubita.altervista.org	TRUE	/	FALSE	1470782285	__cfduid	db5547bcd65cda68f78d50aff941b67381439246285
.asiansubita.altervista.org	TRUE	/	FALSE	1441838285	av_device_cookie	computer
.asiansubita.altervista.org	TRUE	/	FALSE	1441838285	av_mobile_cookie	desktop
.casa-cinema.net	TRUE	/	FALSE	1470761482	__cfduid	dacb4ef9c18182d48afda7d3249838b3c1439225482
.cb01.eu	TRUE	/	FALSE	1470761484	__cfduid	d1997e27d7c1907aa05e72e76911247171439225484
.cineblog01.cc	TRUE	/	FALSE	1470953942	__cfduid	ddcf5f3f8b0c7089fa9c319a69f276fdd1439417942
.eurostreaming.tv	TRUE	/	FALSE	1470761490	__cfduid	d8f31d4f16c31af327fe0d75b91deee291439225490
.filmgratis.cc	TRUE	/	FALSE	1470761491	__cfduid	d437de719421dd0e90ec4a80e7247efa41439225491
.filmpertutti.co	TRUE	/	FALSE	1470761494	__cfduid	d927e31bad6a6608f9f623441e9a6db0c1439225494
.filmsenzalimiti.co	TRUE	/	FALSE	1470761495	__cfduid	d7cb5d9e3aecaee09bfaa3ad3bb3c89661439225495
.guardaserie.net	TRUE	/	FALSE	1470761495	__cfduid	de96bfc053c3f02da0cd76679fa49ecb61439225495
.italia-film.co	TRUE	/	FALSE	1470761505	__cfduid	df7045c66f0650dc8784965260a502c9f1439225505
.italiaserie.co	TRUE	/	FALSE	1470761509	__cfduid	dae0595051420a48ab600e1703a89d0f91439225509
.linkdecrypter.com	TRUE	/	FALSE	1470835873	__cfduid	d4d3257220b59508d90310c6343871d121439299873
.piratestreaming.co	TRUE	/	FALSE	1470761518	__cfduid	dc880ac4c0a0e03b0fd529c01676ca1531439225518
.rapidvideo.org	TRUE	/	FALSE	1471094351	__cfduid	df761091e37722e487528f8f3a70d4d301439558351
.youtube.com	TRUE	/	FALSE	1502629520	GEUP	2fbe0cf3197127de5d4fed66197fbb40aQIAAAA=
.youtube.com	TRUE	/	FALSE	1460595499	PREF	f1=50000000
.youtube.com	TRUE	/	FALSE	1460595498	VISITOR_INFO1_LIVE	40MMt-UZYeU
.youtube.com	TRUE	/	FALSE		YSC	7Bc2xnx3aaA
hubberfilm.org	FALSE	/	FALSE	1470761496	HC	4526170
www.nowvideo.li	FALSE	/	FALSE	1893459661	aff	58617
www.nowvideo.sx	FALSE	/	FALSE	1893459661	aff	1371426
I tried your sheet and it's not working, neither changing the cookies with the ones
I extract from firefox using a plugin.

EDIT:

which gives me this value for the cookies: 6c0a56014d8cd450904e879e9f02233d

and putting:

Código: Seleccionar todo

headers = [
    ['Host','www.itafilm.tv'],
    ['User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0'],
    ['Accept-Encoding','gzip, deflate'],
    ['Cookie','_ddn_intercept_2_=6c0a56014d8cd450904e879e9f02233d'],
]
still gives me a blank menù. Sorry to make you upset :oops: :mrgreen:

Re: Javascript emulator?

Publicado: 15 Ago 2015, 12:37
por DrZ3r0
I tried with itastreaming.co but even here there is a proxy (cloudflare) that prevents kodi download pages.

I read what Robalo wrote : unfortunately his idea does not work with itastreaming.co because cloudflare has a special system to disguise the cookie (it is generated at each new access with a encrypted function).

Furthermore, cookies are not saved in file cookie.dat, when I load itastreaming.co

Through a python script outside kodi using selenium + phantomjs you enter quietly and grab all the video links.

Unfortunately phantomjs there aren't for android / ios so it should not add into kodi.

Any ideas for cloudflare?

Re: Javascript emulator?

Publicado: 16 Ago 2015, 19:23
por zanzibar1982
Yeah, I have seen your work here https://github.com/Dr-Zer0/pelisalacart ... reaming.py

and even if it is possible to browse the site it's not possible to
see movies thumbnails because of cloudflare.

Edit: I see you fixed that itastreaming.co

Re: Javascript emulator?

Publicado: 17 Ago 2015, 12:25
por DrZ3r0
Hi Zanzibar,
I'll explain how the cookie management works between kodi and pelisalacarta.
Kodi writes cookies inside '.kodi/temp/cookie.dat'
instead pelisalacarta writes cookies inside '.kodi/userdata/addon_data/plugin.video.pelisalacarta/cookie.dat'
These two files don't share contents, so the cookies stored by pelisalacarta are not seen by kodi.
The icons and thumbnails are downloaded by kodi that fails because it does not have the cookie. (in python, you don't download icons and thumbnails, you just write them urls)
A solution would be to download them in a cache and put a path to the file instead of the url.
Perhaps robalo has a better idea to overcome this problem.
Bye.

Re: Javascript emulator?

Publicado: 17 Ago 2015, 13:42
por zanzibar1982
Thanks to make that clearer, DrZ3r0.

This problem should be resolved because if

all the sites acted like that, soon we would end up

again with broken channels :(

Re: Javascript emulator?

Publicado: 21 Ago 2015, 18:42
por jesus
For thumbnails making a local proxy can be a simple solution.

This way we can publish a proxy in por 8921, for example, and put http://localhost:8921/... as the thumbnail url.

pelisalacarta will download the thumbnails, using the previous cookie, and problem solved.

For other cases it is usually posible to avoid the javascript emulation, which is slow and fat, and just make the server think javascript has been executed. Examples in pelisalacarta are the unpackjs and unwise algorithms, used in several connectors.

If everything fails, rhino or phantomjs (i didn't know that) come with javascript emulators that may help.

BTW, i usually just discard sites that present strong protection, there are lots of alternatives to choose.

Re: Javascript emulator?

Publicado: 21 Ago 2015, 23:58
por zanzibar1982
BTW, i usually just discard sites that present strong protection, there are lots of alternatives to choose.
I was in the same opinion, then I figured out that there are not as many sites with italian content

as for spanish, so I guess it is worth to spend a litte more time trying to reverse-engineer at least the sites

that are up since a long time, that keep up online their older contents etcetera.

It took almost 5 days to get through guardaserie.net channel, a website that has the most of the contents

for tv shows in our language and with the better rationalization; the site was complex but luckily robalo

was able to fix all the issues. Has anybody seen robalo? :)