MongoDB 这种数据是不是很适合用 mongoDB

xiaoronglv · 2012年10月26日 · 最后由 xiaoronglv 回复于 2012年11月02日 · 4050 次阅读

数据基本上都是科研论文,格式是 xml,我想把这些文章转化为 BSON,然后存在 mongoDB 中。

层级太多没关系吧,嵌套了大概 5 层,写 model 估计会累死。

<PubmedArticle>
    <MedlineCitation Owner="NLM" Status="PubMed-not-MEDLINE">
        <PMID Version="1">23094252</PMID>
        <DateCreated>
            <Year>2012</Year>
            <Month>10</Month>
            <Day>24</Day>
        </DateCreated>
        <DateCompleted>
            <Year>2012</Year>
            <Month>10</Month>
            <Day>25</Day>
        </DateCompleted>
        <Article PubModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">2234-6171</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <Volume>39</Volume>
                    <Issue>5</Issue>
                    <PubDate>
                        <Year>2012</Year>
                        <Month>Sep</Month>
                    </PubDate>
                </JournalIssue>
                <Title>Archives of plastic surgery</Title>
                <ISOAbbreviation>Arch Plast Surg</ISOAbbreviation>
            </Journal>
            <ArticleTitle>Usefulness of Intravenous Anesthesia Using a Target-controlled Infusion System with Local Anesthesia in Submuscular Breast Augmentation Surgery.</ArticleTitle>
            <Pagination>
                <MedlinePgn>540-5</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.5999/aps.2012.39.5.540</ELocationID>
            <Abstract>
                <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Patients have anxiety and fear of complications due to general anesthesia. Through new instruments and local anesthetic drugs, a variety of anesthetic methods have been introduced. These methods keep hospital costs down and save time for patients. In particular, the target-controlled infusion (TCI) system maintains a relatively accurate level of plasma concentration, so the depth of anesthesia can be adjusted more easily. We conducted this study to examine whether intravenous anesthesia using the TCI system with propofol and remifentanil would be an effective method of anesthesia in breast augmentation.</AbstractText>
                <AbstractText Label="METHODS" NlmCategory="METHODS">This study recruited 100 patients who underwent breast augmentation surgery from February to August 2011. Intravenous anesthesia was performed with 10 mg/mL propofol and 50 µg/mL remifentanil simultaneously administered using two separate modules of a continuous computer-assisted TCI system. The average target concentration was set at 2 µg/mL and 2 ng/mL for propofol and remifentanil, respectively, and titrated against clinical effect and vital signs. Oxygen saturation, electrocardiography, and respiratory status were continuously measured during surgery. Blood pressure was measured at 5-minute intervals. Information collected includes total duration of surgery, dose of drugs administered during surgery, memory about surgery, and side effects.</AbstractText>
                <AbstractText Label="RESULTS" NlmCategory="RESULTS">Intraoperatively, there was transient hypotension in two cases and hypoxia in three cases. However, there were no serious complications due to anesthesia such as respiratory difficulty, deep vein thrombosis, or malignant hypertension, for which an endotracheal intubation or reversal agent would have been needed. All the patients were discharged on the day of surgery and able to ambulate normally.</AbstractText>
                <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">Our results indicate that anesthetic methods, where the TCI of propofol and remifentanil is used, might replace general anesthesia with endotracheal intubation in breast augmentation surgery.</AbstractText>
            </Abstract>
            <Affiliation>Department of Plastic and Reconstructive Surgery, Yeungnam University Hospital, Yeungnam University College of Medicine, Daegu, Korea.</Affiliation>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Chung</LastName>
                    <ForeName>Kyu-Jin</ForeName>
                    <Initials>KJ</Initials>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Cha</LastName>
                    <ForeName>Kyu-Ho</ForeName>
                    <Initials>KH</Initials>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Lee</LastName>
                    <ForeName>Jun-Ho</ForeName>
                    <Initials>JH</Initials>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Kim</LastName>
                    <ForeName>Yong-Ha</ForeName>
                    <Initials>YH</Initials>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Kim</LastName>
                    <ForeName>Tae-Gon</ForeName>
                    <Initials>TG</Initials>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Kim</LastName>
                    <ForeName>Il-Guk</ForeName>
                    <Initials>IG</Initials>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <PublicationTypeList>
                <PublicationType>Journal Article</PublicationType>
            </PublicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2012</Year>
                <Month>09</Month>
                <Day>12</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <Country>Korea (South)</Country>
            <MedlineTA>Arch Plast Surg</MedlineTA>
            <NlmUniqueID>101577999</NlmUniqueID>
            <ISSNLinking>2234-6163</ISSNLinking>
        </MedlineJournalInfo>
    </MedlineCitation>
    <PubmedData>
        <History>
            <PubMedPubDate PubStatus="received">
                <Year>2012</Year>
                <Month>5</Month>
                <Day>31</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="revised">
                <Year>2012</Year>
                <Month>8</Month>
                <Day>14</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="accepted">
                <Year>2012</Year>
                <Month>8</Month>
                <Day>14</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="epublish">
                <Year>2012</Year>
                <Month>9</Month>
                <Day>12</Day>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="entrez">
                <Year>2012</Year>
                <Month>10</Month>
                <Day>25</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="pubmed">
                <Year>2012</Year>
                <Month>10</Month>
                <Day>25</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="medline">
                <Year>2012</Year>
                <Month>10</Month>
                <Day>25</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
        </History>
        <PublicationStatus>ppublish</PublicationStatus>
        <ArticleIdList>
            <ArticleId IdType="doi">10.5999/aps.2012.39.5.540</ArticleId>
            <ArticleId IdType="pubmed">23094252</ArticleId>
        </ArticleIdList>
    </PubmedData>
</PubmedArticle>

囧。。。又不是拿过来什么样子就就要存成什么样子,你根据自己的需要来改结构啊

很适合

#2 楼 @Rei

我不再纠结了,我要学好 MongoDB。

#2 楼 @Rei

今天忽然发现还有另外一种论文结构,冏。 请问一个 model 下,可以有不同的文档结构(schema)吗?

<BookDocument>
       <PMID Version="1">23101096</PMID>
       <ArticleIdList>
           <ArticleId IdType="bookaccession">NBK84157</ArticleId>
       </ArticleIdList>
       <Book>
           <Publisher>
               <PublisherName>Institute for Quality and Efficiency in Health Care (IQWiG)</PublisherName>
               <PublisherLocation>Cologne, Germany</PublisherLocation>
           </Publisher>
           <BookTitle book="iqwigsum">Institute for Quality and Efficiency in Health Care: Executive Summaries</BookTitle>
           <PubDate>
               <Year>2005</Year>
           </PubDate>
           <BeginningDate>
               <Year>2005</Year>
           </BeginningDate>
           <Medium>Internet</Medium>
       </Book>
       <ArticleTitle book="iqwigsum" part="a0507">Benefit assessment of long-term blood glucose lowering to near-normal levels in patients with type 2 diabetes mellitus: Executive summary of final report A05-07, Version 1.0</ArticleTitle>
       <Language>eng</Language>
       <Abstract>
           <AbstractText>The aim of the present investigation is the benefit assessment of measures with the goal of long-term adjustment of BG to near-normal levels compared to measures with no goal or a less intensive goal of BG adjustment in patients with type 2 diabetes mellitus in respect of patient-relevant outcomes.</AbstractText>
           <CopyrightInformation>© IQWiG (Institute for Quality and Efficiency in Health Care).</CopyrightInformation>
       </Abstract>
       <Sections>
           <Section>
               <SectionTitle book="iqwigsum" part="a0507" sec="a0507.s2">Background</SectionTitle>
           </Section>
           <Section>
               <SectionTitle book="iqwigsum" part="a0507" sec="a0507.s3">Aim of the investigation</SectionTitle>
           </Section>
           <Section>
               <SectionTitle book="iqwigsum" part="a0507" sec="a0507.s4">Methods</SectionTitle>
           </Section>
           <Section>
               <SectionTitle book="iqwigsum" part="a0507" sec="a0507.s5">Results</SectionTitle>
           </Section>
           <Section>
               <SectionTitle book="iqwigsum" part="a0507" sec="a0507.s6">Conclusions</SectionTitle>
           </Section>
       </Sections>
       <ContributionDate>
           <Year>2011</Year>
           <Month>06</Month>
           <Day>06</Day>
       </ContributionDate>
   </BookDocument>
   <PubmedBookData>
       <History>
           <PubMedPubDate PubStatus="pubmed">
               <Year>2012</Year>
               <Month>10</Month>
               <Day>27</Day>
               <Hour>6</Hour>
               <Minute>0</Minute>
           </PubMedPubDate>
           <PubMedPubDate PubStatus="medline">
               <Year>2012</Year>
               <Month>10</Month>
               <Day>27</Day>
               <Hour>6</Hour>
               <Minute>0</Minute>
           </PubMedPubDate>
           <PubMedPubDate PubStatus="entrez">
               <Year>2012</Year>
               <Month>10</Month>
               <Day>27</Day>
               <Hour>6</Hour>
               <Minute>0</Minute>
           </PubMedPubDate>
       </History>
       <PublicationStatus>ppublish</PublicationStatus>
       <ArticleIdList>
           <ArticleId IdType="pubmed">23101096</ArticleId>
       </ArticleIdList>
   </PubmedBookData>

Mongodb 是 schema-less 的,但是本来都准备要用 mongodb 了,为什么还问合不合适呢? 转化的话用 xslt 就行吧。另外学习 mongodb 可以看看最近刚开始的 10-gen 的课程。

#4 楼 @xiaoronglv 可以。一般 web 应用不一定用得着 mongodb 的无模式,不过你这个存储不同的文档结构的需求就合适到爆了。

直接从 mongodb 的官方文档看起。

楼主的问题需要明确一个关键——这些 xml 数据你可能会怎么使用?建议说清楚这个,否则给出的建议可能不太适合你

你这种应用,必须是 NoSQL

#8 楼 @huangzhichong

两种数据格式不同,model 暂时还不会写。

需要 登录 后方可回复, 如果你还没有账号请 注册新账号