Mi media center - Foro

Hola

Can anybody explain to me what am I doing wrong here?

I began this from scratch, I cannot seem to reach the web page, only getting blank fields.

Is my syntax wrong?

Código: Seleccionar todo

# -*- coding: utf-8 -*-
#------------------------------------------------------------
# pelisalacarta - XBMC Plugin
# Canale per OnePiece Sub Ita
# http://blog.tvalacarta.info/plugin-xbmc/pelisalacarta/
#------------------------------------------------------------
import urlparse,urllib2,urllib,re
import os, sys

from core import logger
from core import config
from core import scrapertools
from core.item import Item
from servers import servertools

__channel__ = "onepieceita"
__category__ = "A"
__type__ = "generic"
__title__ = "OnePiece Sub Ita"
__language__ = "IT"

sito="http://archive.forumcommunity.net/"

DEBUG = config.get_setting("debug")

def isGeneric():
    return True

def mainlist(item):
    logger.info("pelisalacarta.onepieceita mainlist")
    itemlist = []
    itemlist.append( Item(channel=__channel__, title="Lista episodi", action="peliculas", url="http://archive.forumcommunity.net/?t=34189487"))
    itemlist.append( Item(channel=__channel__, title="Prova", action="findvideos", server="streaminto", url="http://archive.forumcommunity.net/?t=46890783"))

    
    return itemlist
	
def peliculas(item):
    logger.info("pelisalacarta.onepieceita peliculas")
    itemlist = []

    # Descarga la pagina
    data = scrapertools.cache_page(item.url)

    # Extrae las entradas (carpetas)
    patron = '<br><a href="(.*?)" target="_blank">.*?<span style="color:#3a3a3a">.*?</b></span>(.*?)</a>'
    matches = re.compile(patron,re.DOTALL).findall(data)
    scrapertools.printMatches(matches)

    for scrapedurl,scrapedtitle in matches:
        if (DEBUG): logger.info("url=["+scrapedurl+"], title=["+scrapedtitle+"]")
        itemlist.append( Item(channel=__channel__, action="findvideos", title=scrapedtitle , url=sito+scrapedurl , folder=False ) )

	return itemlist

Pueba con añadir este 'headers' y añadir a la petición el 'headers'

Código: Seleccionar todo

[......]
sito="http://archive.forumcommunity.net/"

headers = [
    ['Host','archive.forumcommunity.net'],
    ['User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0'],
    ['Accept-Encoding','gzip, deflate'],
    ['Cookie','']
]
[......]
    # Descarga la pagina
    data = scrapertools.cache_page( item.url,  headers=headers )
[......]

robalo,

this time la flauta no suena

pero gracias de cualquier manera.

I also think I am doing something wrong so I have to triple-check.

Furthermore, I think there are too many elements in the page

so even if we could extract the voices, I doubt a 4-500 elements list

would be fine to browse. I guess it's going to be too heavy so I'm

going to think about another solution.

Te puedo garantizar que con esa cabecera no devuelve data = "". Data se llena con el contenido de la página.

De todas formas el foro es muy caótico y tendrías que casi crear un patron por serie, parace que no existe una regla. Cada usuario coloca los enlaces a su gusto.

Si tenías pensado crear una lista con el contenido de "Lista Anime dalla A alla Z" ( ?t=55769113 ) es fácil crearla y se carga la lista de las 735 series bastante rápido. El problema lo tienes después, No hay dos contenidos iguales por lo que tendrías que crear un patrón por serie o casi.

yes, too much confusion to extract data
from that foro... thanks for bothering robalo

Mi media center - Foro

OnePiece Anime Sub Ita channel help

OnePiece Anime Sub Ita channel help

Re: OnePiece Anime Sub Ita channel help

Re: OnePiece Anime Sub Ita channel help

Re: OnePiece Anime Sub Ita channel help

Re: OnePiece Anime Sub Ita channel help